The paper proposes a framework called CLIX3D to address the problem of domain generalization in 3D object detection. The key insights are:
Multimodal fusion of LiDAR and image data can improve the robustness of 3D object detectors to unseen domain shifts, as the two modalities provide complementary information and are affected differently by changes in environmental conditions.
Performing supervised contrastive learning on region-level features, by aligning features of the same object category across different domains and pushing apart features of different categories, can encourage the learning of domain-invariant representations.
The paper first introduces a multi-stage LiDAR-image fusion module called MSFusion, which outperforms prior fusion methods. It then presents the CLIX3D framework that combines this multimodal fusion with the supervised contrastive learning approach to train 3D object detectors that generalize better to unseen target domains. Experiments on multiple autonomous driving datasets demonstrate the effectiveness of the proposed approach in improving domain generalization performance compared to direct transfer and single-source domain generalization baselines.
Іншою мовою
із вихідного контенту
arxiv.org
Ключові висновки, отримані з
by Deepti Hegde... о arxiv.org 04-19-2024
https://arxiv.org/pdf/2404.11764.pdfГлибші Запити