The paper proposes a framework called CLIX3D to address the problem of domain generalization in 3D object detection. The key insights are:
Multimodal fusion of LiDAR and image data can improve the robustness of 3D object detectors to unseen domain shifts, as the two modalities provide complementary information and are affected differently by changes in environmental conditions.
Performing supervised contrastive learning on region-level features, by aligning features of the same object category across different domains and pushing apart features of different categories, can encourage the learning of domain-invariant representations.
The paper first introduces a multi-stage LiDAR-image fusion module called MSFusion, which outperforms prior fusion methods. It then presents the CLIX3D framework that combines this multimodal fusion with the supervised contrastive learning approach to train 3D object detectors that generalize better to unseen target domains. Experiments on multiple autonomous driving datasets demonstrate the effectiveness of the proposed approach in improving domain generalization performance compared to direct transfer and single-source domain generalization baselines.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Deepti Hegde... lúc arxiv.org 04-19-2024
https://arxiv.org/pdf/2404.11764.pdfYêu cầu sâu hơn