DriveDreamer-2 leverages LLM for user-customized driving videos, surpassing state-of-the-art methods in quality and diversity. The framework includes HDMap generation and UniMVM for enhanced video coherence.
On a rainy day, DriveDreamer-2 showcases powerful capabilities in generating multi-view driving videos based on user descriptions. It enhances the diversity of synthetic data and surpasses other methods in generation quality. The proposed model can produce uncommon driving scenarios like vehicles abruptly cutting in.
World models have been pivotal in autonomous driving, with DriveDreamer-2 being the first to generate customized driving videos efficiently. By incorporating an LLM interface, agent trajectories are generated from user queries, leading to improved training of various driving perception methods.
The HDMap generator simulates road structures based on agent trajectories as conditions, ensuring background elements align with foreground traffic conditions. UniMVM unifies multi-view video generation for enhanced temporal and spatial coherence.
Experimental results demonstrate that DriveDreamer-2 significantly improves FID and FVD scores compared to previous methods. It enhances training for 3D object detection and multi-object tracking tasks through synthetic data augmentation.
Para Outro Idioma
do conteúdo original
arxiv.org
Principais Insights Extraídos De
by Guosheng Zha... às arxiv.org 03-12-2024
https://arxiv.org/pdf/2403.06845.pdfPerguntas Mais Profundas