Core Concepts
Dragtraffic is a generalized, point-based, and controllable traffic scene generation framework that enables non-experts to create a variety of realistic driving scenarios for different types of traffic agents through an adaptive mixture expert architecture and conditional diffusion modeling.
Abstract
The paper proposes Dragtraffic, a traffic scene generation framework that addresses the limitations of existing methods in terms of controllability, accuracy, and versatility. The key highlights are:
Dragtraffic uses a regression model to provide an initial guess for the traffic scene, and then refines it using a conditional diffusion model to ensure diversity and realism. This allows for the generation of realistic and diverse traffic scenes.
The framework adopts a symmetric hybrid expert architecture that adapts to different types of traffic agents, such as vehicles, pedestrians, and cyclists, by using separate models dedicated to each agent type. This enhances the generalizability of the framework.
Dragtraffic introduces user-customized context through cross-attention, enabling a high degree of controllability. Users can interactively generate and edit traffic scenes by dragging and typing context information, such as agent type, position, velocity, and orientation.
Experiments on a real-world driving dataset show that Dragtraffic outperforms existing methods in terms of authenticity, diversity, and freedom, making it a promising tool for the evaluation and training of autonomous driving systems.
Stats
The dataset consists of around 70,000 scenarios, each with 20-second trajectories. The authors split each 20-second scenario into 6-second intervals and removed scenarios with less than 32 agents. They then cropped a rectangular area with a 120-meter side length centered on the ego agent and classified scenarios into three datasets: ego-centered, cyclist-centered, and pedestrian-centered.
Quotes
"Dragtraffic enables non-experts to generate a variety of realistic driving scenarios for different types of traffic agents through an adaptive mixture expert architecture."
"We use a regression model to provide a general initial solution and a refinement process based on the conditional diffusion model to ensure diversity."
"User-customized context is introduced through cross-attention to ensure high controllability."