Kernkonzepte
This survey offers a comprehensive examination of collaborative perception datasets in the context of Vehicle-to-Infrastructure (V2I), Vehicle-to-Vehicle (V2V), and Vehicle-to-Everything (V2X) communication, highlighting the latest developments in large-scale benchmarks that accelerate advancements in perception tasks for autonomous vehicles.
Zusammenfassung
This survey provides a comprehensive analysis of collaborative perception datasets for autonomous driving. It systematically examines a variety of datasets, comparing them based on aspects such as diversity, sensor setup, quality, public availability, and their applicability to downstream tasks like 3D object detection, object tracking, motion prediction, trajectory prediction, and domain adaptation.
The key highlights of the survey include:
Detailed analysis of road intersection datasets, such as BAAI-VANJEE, IPS300+, Rope3D, TUMTraf-I, and RCooper, which are crucial for refining 3D object detection and localization in complex urban environments.
Comprehensive review of collaborative perception datasets, including V2X-Sim 1.0, V2X-Sim 2.0, OPV2V, DAIR-V2X, V2XSet, DOLPHINS, LUCOOP, V2V4Real, V2X-Seq, DeepAccident, and TumTraf-V2X. These datasets focus on enhancing V2V and V2X communication by simulating complex urban environments and diverse driving scenarios.
Identification of key challenges, such as domain shift, sensor setup limitations, dataset diversity, and availability, along with the importance of addressing privacy and security concerns in the development of datasets.
Emphasis on the necessity for comprehensive, globally accessible datasets and collaborative efforts from both technological and research communities to overcome these challenges and fully harness the potential of autonomous driving.
Statistiken
"This dataset features 74,000 3D and 105,000 2D object annotations."
"The IPS300+ dataset contains an average of 319.84 labels per frame, which is significantly higher than many existing datasets like KITTI."
"Rope3D includes a collection of 50,000 images and 1.5 million 3D annotations."
"TUMTraf-I comprises 4,800 images and LiDAR point cloud frames, which include over 57,406 labeled 3D annotations."
"RCooper includes 50,000 images and 30,000 point clouds, covering two primary traffic scenes: intersections and corridors."
Zitate
"Integrating data from multiple sources increases the field of view, leading towards a holistic view of the surroundings. This multi-faceted perception enhances safety by providing a more accurate representation of the environment and contributes to more efficient traffic flow and better decision-making capabilities for autonomous vehicles."
"Established single-vehicle datasets such as KITTI, nuScenes, and Waymo do not address the complexity of collaborative perception in addition to limitations such as sensor heterogeneity, communication protocols testing, information fusion, testing and validation of collaborative perception frameworks."