The paper proposes CrowdDiff, a novel crowd counting framework that treats density map generation as a denoising diffusion process. Key highlights:
CrowdDiff uses narrow Gaussian kernels to generate ground truth density maps, which helps maintain the distribution of density pixel values and improves the quality of the generated density maps.
The paper introduces a joint learning approach, where an auxiliary regression branch is used during training to estimate the crowd count from the encoder-decoder features of the denoising network, improving the feature learning.
CrowdDiff leverages the stochastic nature of diffusion models to generate multiple realizations of the crowd density map. These realizations are then fused using a systematic approach to improve the final crowd counting performance.
Instead of summing over the density map values, CrowdDiff performs thresholding on the density maps to detect individual density kernels, which is more robust to background noise compared to density summation.
Extensive experiments on public crowd counting datasets show that CrowdDiff outperforms state-of-the-art crowd counting methods, especially in dense crowd scenes, by generating accurate density maps and effectively leveraging the generated information for counting.
To Another Language
from source content
arxiv.org
Głębsze pytania