The paper presents UniPAD, a universal pre-training paradigm for 3D representation learning in the context of autonomous driving. The key highlights are:
UniPAD employs 3D differentiable rendering to reconstruct the complete geometric and appearance characteristics of the input data, which can be either 3D LiDAR point clouds or multi-view images. This enables the model to learn a continuous 3D representation beyond low-level statistics.
The flexibility of UniPAD allows it to be easily integrated into both 2D and 3D frameworks, enabling a more holistic understanding of the driving scenes.
UniPAD introduces a memory-efficient ray sampling strategy to reduce the computational burden during the rendering process, which is crucial for practical applications.
Extensive experiments on the nuScenes dataset demonstrate the superiority of UniPAD over previous self-supervised pre-training methods. UniPAD significantly improves the performance of various 3D perception tasks, including 3D object detection and 3D semantic segmentation, achieving state-of-the-art results.
The authors show that UniPAD can be seamlessly applied to different modalities (LiDAR, camera, and fusion) and backbone architectures, showcasing its strong generalization ability.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문