Core Concepts
This comprehensive survey presents an exhaustive analysis of 265 autonomous driving datasets, evaluating their impact, annotation quality, and future development trends.
Abstract
This survey provides an in-depth analysis of autonomous driving datasets from multiple perspectives:
Dataset Collection and Evaluation:
The authors systematically collected 265 publicly available autonomous driving datasets through extensive searches.
They introduced a novel "impact score" metric to evaluate the significance and influence of each dataset, considering factors like citation count, data dimensions, and environmental diversity.
Sensor Modalities and Sensing Domains:
The survey discusses the various sensor types used in autonomous driving, including cameras, LiDARs, radars, event-based cameras, and thermal cameras, along with their characteristics and applications.
It also categorizes the sensing domains into onboard, Vehicle-to-Everything (V2X), drone-based, and others, highlighting the advantages and limitations of each.
Autonomous Driving Tasks:
The survey covers the key tasks in autonomous driving, such as perception and localization, prediction, planning, and control, outlining their objectives, required data, and evaluation metrics.
Analysis of High-Impact Datasets:
The authors provide in-depth discussions of several high-impact datasets, including KITTI, Cityscapes, nuScenes, and Waymo, examining their strengths, limitations, and suitability for different tasks.
Annotation Quality and Process:
The survey investigates the annotation processes and tools used for autonomous driving datasets, emphasizing the importance of establishing standard annotation pipelines to ensure high-quality labels.
Data Distribution Analysis:
The authors present detailed statistical analyses of the data distribution across various datasets, highlighting their inherent biases and suitable use cases.
Future Trends and Challenges:
The survey discusses emerging trends and potential research directions in autonomous driving datasets, such as integrating language, generating data using Vision-Language Models, and promoting open data ecosystems.
This comprehensive survey serves as a valuable resource for researchers and practitioners in the autonomous driving domain, facilitating informed dataset selection and guiding the development of future datasets.
Stats
The Waymo Open Dataset provides an extensive size of multimodal sensory data with high-quality annotations, covering a comprehensive range of driving conditions and geographies.
The BDD100K dataset is renowned for its large size and diversity, contributing to the robustness and generalizability of autonomous driving algorithms.
The nuScenes dataset addresses the diversity in urban scenes and environmental conditions, with a multimodal sensor setup including LiDAR, radars, and cameras.
Quotes
"High-quality datasets are fundamental for developing reliable autonomous driving algorithms."
"We present an exhaustive study of 265 autonomous driving datasets from multiple perspectives, including sensor modalities, data size, tasks, and contextual conditions."
"We introduce a novel metric to evaluate the impact of datasets, which can also be a guide for creating new datasets."