核心概念
Leveraging the generalization and robustness of visual foundation models like SAM to enhance the resilience of multi-modal 3D object detection in autonomous driving scenarios.
摘要
The paper proposes a robust framework called RoboFusion that leverages visual foundation models (VFMs) like SAM to tackle out-of-distribution (OOD) noise scenarios in multi-modal 3D object detection for autonomous driving.
Key highlights:
- Adapts the original SAM for autonomous driving scenarios, named SAM-AD, and introduces AD-FPN to align SAM with multi-modal 3D object detectors.
- Employs wavelet decomposition to denoise the depth-guided images for further noise reduction and weather interference.
- Utilizes self-attention mechanisms to adaptively reweight the fused features, enhancing informative features while suppressing excess noise.
- Validates RoboFusion's robustness against OOD noise scenarios in KITTI-C and nuScenes-C datasets, achieving state-of-the-art performance amid noise.
The paper demonstrates that RoboFusion gradually reduces noise by leveraging the generalization and robustness of VFMs, thereby enhancing the resilience of multi-modal 3D object detection for autonomous driving.
統計資料
The paper presents several key statistics and figures to support the authors' arguments:
The authors employ Gaussian distributions to represent the distributional disparities between clean and noisy datasets, showing a large gap in data distribution.
Comparison of SOTA methods and RoboFusion on the KITTI Moderate-level Car AP, where RoboFusion outperforms the top method LoGoNet by a margin of 23.12% mAP in noisy scenarios.
Comparison of SOTA methods and RoboFusion on the nuScenes validation set, where RoboFusion achieves the best mAP performance across various noise conditions.
引述
"Multi-modal 3D object detectors are dedicated to exploring secure and reliable perception systems for autonomous driving (AD). However, while achieving state-of-the-art (SOTA) performance on clean benchmark datasets, they tend to overlook the complexity and harsh conditions of real-world environments."
"Inspired by the success of VFMs in CV tasks, in this work, we intend to use these models to tackle the challenges of multi-modal 3D object detectors in OOD noise scenarios."
"Consequently, our RoboFusion achieves state-of-the-art performance in noisy scenarios, as demonstrated by the KITTI-C and nuScenes-C benchmarks."