The paper introduces the Fuss-Free Network (FFNet), a crowd counting deep learning model characterized by its simplicity and efficiency in terms of its structure. The model comprises only a backbone of a neural network and a multi-scale feature fusion structure.
The multi-scale feature fusion structure is a simple architecture consisting of three branches, each equipped with a focus transition module, and combines the features from these branches through concatenation. The focus transition module effectively focuses on dynamic and static features, achieving efficient dimensionality reduction and feature extraction to facilitate the transition.
Experimental results on four widely used public datasets show that FFNet achieves accuracy comparable to existing complex models, despite its compact structure, small number of parameters, and low computational complexity. This breaks the inherent relationship between complexity and performance improvement, providing a simple and efficient solution for crowd counting tasks.
The paper also investigates three multi-scale feature fusion methods, with concatenate feature fusion demonstrating advantages in crowd counting tasks. Additionally, the designed loss function, which combines counting loss, optimal transport loss, and variation loss, enhances the accuracy and robustness of the crowd counting model.
Vers une autre langue
à partir du contenu source
arxiv.org
Questions plus approfondies