Concetti Chiave
The authors propose TMVA4D, a convolutional neural network architecture that leverages 4D radar data to robustly detect and segment humans in challenging field conditions with reduced visibility due to airborne particulates.
Sintesi
The authors present a novel dataset and a deep learning-based approach for human detection using 4D radar data. The key highlights are:
Data Collection: The dataset was collected using a car-mounted sensor system in various environments, including an underground mine, a car wash, an industrial tent, and an outdoor wooded area. The data features people in different positions and performing various actions, with visibility reduced by induced dust, water spray, and smoke.
Data Representation: The 4D radar data is represented as heatmaps in five different views: elevation-azimuth (EA), elevation-range (ER), elevation-Doppler (ED), range-azimuth (RA), and Doppler-azimuth (DA). These heatmaps are used as input to the proposed TMVA4D architecture.
TMVA4D Architecture: TMVA4D is a convolutional neural network architecture based on TMVA-Net, designed for semantic segmentation of humans in the EA view. It takes multiple frames of radar heatmaps as input and outputs segmentation masks for the background and person classes.
Training and Evaluation: The TMVA4D model is trained and evaluated on the collected dataset. The best-performing model achieves an mIoU score of 78.2% and an mDice score of 86.1% on the test set, demonstrating the effectiveness of the proposed approach for human detection in low-visibility field conditions.
The authors show that the 4D radar data, combined with the TMVA4D architecture, can provide a viable solution for detecting people in challenging environments where other sensor modalities, such as cameras and lidars, may fail due to the presence of airborne particulates.
Statistiche
The dataset used in this work contains 102,966 frames of 4D radar data and corresponding thermal images. 4,900 (4.8%) of the thermal images were manually annotated, and the remaining images were automatically annotated using a YOLOv8 model.
The percentage of mask pixels representing the "person" class across the entire dataset is 1.1%, and 61% of the segmentation masks contain at least one pixel representing the "person" class.
Citazioni
"Radar, on the other hand, is a promising modality that is less affected by, e.g., dust, smoke, water mist or fog."
"Mining is an industry that rapidly adopts autonomous vehicles. Underground operations often involve a serious risk of injury or death; a significant danger is posed by heavy machinery such as dump trucks or drill rigs operating near personnel."