Core Concepts
A lightweight crowd density estimation network (MCNet) is proposed that integrates a multi-scale attention module to effectively extract and enhance crowd texture features, enabling accurate and efficient crowd density classification in metro video surveillance.
Abstract
The paper proposes a novel Metro Crowd density estimation Network (MCNet) to automatically classify the crowd density level of metro passengers.
Key highlights:
- An Integrating Multi-scale Attention (IMA) module is developed to enhance the ability of the plain classifiers to extract semantic crowd texture features. The IMA module fuses dilation convolution, multi-scale feature extraction, and attention mechanism to obtain multi-scale crowd feature activation from a larger receptive field with lower computational cost.
- A lightweight crowd texture feature extraction network is designed, which can directly process video frames and automatically extract texture features for crowd density estimation, with faster image processing speed and fewer network parameters to enable deployment on embedded platforms with limited hardware resources.
- The MCNet is constructed by integrating the IMA module and the lightweight crowd texture feature extraction network. Experiments on benchmark datasets and a large-scale metro crowd dataset (SH METRO) demonstrate that the MCNet achieves competitive crowd density estimation performance in terms of accuracy, model size, and inference speed, making it suitable for metro video surveillance applications on embedded platforms.
- Deployment experiments on an embedded device further validate the feasibility of using the MCNet for real-time, accurate, and energy-efficient metro crowd density estimation, providing a practical solution to assist metro managers in monitoring passenger flow.
Stats
The crowd density distribution of the PETS2009 dataset shows that the number of images with high, medium, and low crowd density are 640, 635, and 648 respectively in the training set, and 315, 279, and 293 respectively in the testing set.
The crowd density distribution of the Mall dataset shows that the number of images with high, medium, and low crowd density are 250, 293, and 278 respectively in the training set, and 106, 122, and 120 respectively in the testing set.
The crowd density distribution of the QUT dataset shows that the number of images with high, medium, and low crowd density are 536, 1049, and 1013 respectively in the training set, and 536, 1049, and 1014 respectively in the testing set.
The crowd density distribution of the SH METRO dataset shows that the number of images with high, medium, and low crowd density are 1473, 345, and 772 respectively in the training set, and 685, 182, and 377 respectively in the testing set.
Quotes
"Aiming at the metro video surveillance system has not been able to effectively solve the metro crowd density estimation problem, a Metro Crowd density estimation Network (called MCNet) is proposed to automatically classify crowd density level of passengers."
"Firstly, an Integrating Multi-scale Attention (IMA) module is proposed to enhance the ability of the plain classifiers to extract semantic crowd texture features to accommodate to the characteristics of the crowd texture feature."
"Secondly, a novel lightweight crowd texture feature extraction network is proposed, which can directly process video frames and automatically extract texture features for crowd density estimation, while its faster image processing speed and fewer network parameters make it flexible to be deployed on embedded platforms with limited hardware resources."