核心概念
This paper introduces OneDet3D, a novel 3D object detection model capable of generalizing across diverse indoor and outdoor datasets with a single set of parameters, addressing the limitations of existing detectors restricted to single-domain training.
摘要
OneDet3D: A Universal 3D Object Detection Model for Multi-Domain Point Clouds
This research paper presents OneDet3D, a novel approach to 3D object detection that overcomes the limitations of existing methods by enabling training and inference on point clouds from multiple domains using a single model.
Research Objective:
The study aims to address the challenge of domain-specific training in 3D object detection, where models trained on one dataset often fail to generalize to others. The authors propose OneDet3D, a universal model capable of learning from diverse indoor and outdoor point clouds and generalizing to unseen domains and categories.
Methodology:
OneDet3D leverages a fully sparse architecture with 3D sparse convolution for feature extraction and an anchor-free detection head for 3D bounding box prediction. To mitigate data-level interference arising from differences in point cloud characteristics, the authors introduce domain-aware partitioning, which separates parameters related to data scatter and context learning based on the input domain. Additionally, language-guided classification using CLIP embeddings addresses category-level interference caused by inconsistent label spaces across datasets.
Key Findings:
- OneDet3D achieves comparable or superior performance to state-of-the-art single-dataset trained models on benchmark datasets like SUN RGB-D, ScanNet, KITTI, and nuScenes, demonstrating its ability to learn universal 3D object detection knowledge.
- The model exhibits strong generalization capabilities, achieving significant performance improvements in cross-domain evaluations on S3DIS and Waymo datasets, highlighting the effectiveness of multi-dataset training.
- Ablation studies confirm the importance of domain-aware partitioning and language-guided classification in mitigating interference and enhancing performance.
Main Conclusions:
OneDet3D presents a significant advancement in 3D object detection by enabling a single model to generalize across diverse domains, categories, and scenes. This research paves the way for universal 3D object detection models and 3D foundation models.
Significance:
This research significantly contributes to the field of computer vision by introducing a universal 3D object detection model, addressing a critical limitation of existing methods. The proposed approach has the potential to accelerate the development of robust and adaptable 3D perception systems for various applications, including autonomous driving, robotics, and augmented reality.
Limitations and Future Research:
While OneDet3D demonstrates promising results, future research could explore incorporating more diverse datasets and exploring alternative domain adaptation techniques to further enhance generalization capabilities. Additionally, investigating the model's performance on resource-constrained platforms could broaden its applicability.
統計資料
Indoor and outdoor point clouds exhibit significant differences in range, with differences exceeding 10 to nearly 20 times.
On the SUN RGB-D dataset, OneDet3D achieves a 65.0% AP25, surpassing FCAF3D by 1.2%.
On the outdoor KITTI dataset, OneDet3D performs comparably to PV-RCNN.
On nuScenes, OneDet3D's AP surpasses existing methods such as VoxelNeXt and UVTR.
After multi-dataset joint training, the performance of OneDet3D exceeds its own from single-dataset training by 1.8% on both the SUN RGB-D and KITTI datasets.
On the SUN RGB-D dataset, OneDet3D achieves an APnovel improvement of over 5.94% compared to CoDA.
On the ScanNet dataset, OneDet3D achieves a 15.52% APnovel, surpassing CoDA by even more than 9%.
After training on both SUN RGB-D and ScanNet datasets, the cross-domain AP on S3DIS improves by more than 4%.
With the introduction of two outdoor datasets (KITTI and nuScenes), AP25 on S3DIS improves by 0.9%.
Through multi-dataset training on KITTI and nuScenes, OneDet3D achieves a substantial 23.1% improvement in cross-dataset AP3D on Waymo.
Language embeddings contribute to a more than 2% improvement in AP in cross-dataset experiments on S3DIS.
引述
"Unlike mature 2D detectors [29, 14, 38, 4], which once trained, can generally conduct inference on different types of images in various scenes and environments, current 3D detectors still follow a single-dataset training-and-testing paradigm."
"To the best of our knowledge, this is the first 3D detector that supports point clouds from domains in both indoor and outdoor simultaneously with only one set of parameters."