Core Concepts
Establishing a generalized video prediction paradigm for autonomous driving with the GenAD model.
Abstract
The content introduces the GenAD model, a large-scale video prediction model for autonomous driving. It addresses challenges in generalization and training efficiency by leveraging diverse data sources and novel temporal reasoning blocks. The model showcases strong generalization capabilities across various driving scenarios, including zero-shot domain transfer, language-conditioned prediction, action-conditioned prediction, and motion planning. Through two-stage learning, GenAD demonstrates robust performance in predicting future frames accurately and efficiently.
Stats
OpenDV-2K dataset contains over 2000 hours of driving videos.
GenAD trained on OpenDV-2K achieves FVD of 184.
GenAD surpasses previous models in image fidelity (FID) and video coherence (FVD).
GenAD-nus trained on nuScenes dataset performs on par with GenAD on nuScenes but struggles to generalize to unseen datasets like Waymo.
Quotes
"We aim to establish a generalized video prediction paradigm for autonomous driving."
"GenAD can be adapted into an action-conditioned prediction model or a motion planner."
"GenAD exhibits remarkable zero-shot generalization ability and visual quality."