Core Concepts
Introducing a novel approach, Find n’ Propagate, to enhance open-vocabulary 3D object detection in urban environments.
Abstract
The content discusses the limitations of current LiDAR-based 3D object detection systems and proposes an open-vocabulary learning approach. It explores four baseline solutions and introduces the Find n’ Propagate method to improve the recall of novel objects. Extensive experiments demonstrate significant improvements in novel recall and average precision for novel object classes.
Introduction to LiDAR-based 3D object detection.
Challenges with limited class vocabulary and high annotation costs.
Exploration of open-vocabulary learning using pre-trained vision-language models.
Design and benchmarking of four baseline solutions for 3D object detection.
Introduction of the Find n’ Propagate approach to maximize recall of novel objects.
Implementation details of the Greedy Box Seeker, Greedy Box Oracle, and Remote Propagator.
Experimental results showcasing improvements in novel recall and average precision across diverse settings.
Stats
Experiments demonstrate a 53% improvement in novel recall across diverse settings, VLMs, and 3D detectors.
Quotes
"Our exploration of open-vocabulary (OV) learning in urban environments aims to capture novel instances using pre-trained vision-language models (VLMs) with multi-sensor data."
"We introduce a universal Find n’ Propagate approach for 3D OV tasks, aimed at maximizing the recall of novel objects."