Core Concepts
Researchers are turning to synthetic datasets to easily generate rich and changeable data as an effective complement to real-world datasets, improving algorithm performance in autonomous driving tasks.
Abstract
The paper discusses the evolution of synthetic dataset generation methods for autonomous driving perception tasks. It highlights the role of synthetic datasets in evaluation, gap testing, and algorithm testing. Various stages of development and key datasets are explored, emphasizing the importance of bridging the domain gap between synthetic and real-world data.
The content covers the inception of synthetic data in the 1960s for computer vision algorithms, leading to modern applications in autonomous driving. It details the creation of various synthetic datasets like FRIDA, MPI Sintel, Flying Things, GTA-V dataset, SYNTHIA, VEIS, Foggy Cityscapes, IDDA, CarlaScenes, Virtual KITTI2, SHIFT, V2X-Sim, AIODrive, and OPV2V.
Key metrics such as frame counts and sensor suites are highlighted along with tasks covered by each dataset. The discussion extends to evaluating synthetic datasets' effectiveness for training algorithms and transferring conclusions to real-world scenarios. Strategies to bridge appearance and content gaps between synthetic and real data are also explored.
Stats
FRIDA provides 90 synthetic images of urban road scenes with different types of fog.
MPI Sintel contains 1K pairs of images for optical flow estimation.
Flying Things offers over 25K stereo image pairs for disparity estimation and scene flow estimation.
GTA-V dataset includes 25K pixel-level semantic segmentation images from a commercial video game.
SYNTHIA features over 213K synthetic images captured from different viewpoints with pixel-level annotations for various categories.
VEIS consists of 61K frames annotated with instance segmentation information using Unity3D game engine.
Foggy Cityscapes comprises 20K images with fine-grained semantic annotations created using MATLAB platform.
IDDA contains over 1M images with pixel-level semantic information generated on the Carla platform.
CarlaScenes provides diverse scenarios like uphill/downhill roads and rural environments for odometry measurement using cameras, LiDARs, IMU sensors on Unreal Engine 4 platform.
Quotes
"We propose a framework for evaluating synthetic datasets to facilitate the generation of trustworthy datasets." - Content
"Synthetic datasets play a central role in previous research efforts related to autonomous driving development." - Content
"The emergence of large language models (LLM) combined with LLM simulates various controlled environments." - Content