Core Concepts
ODTFormer proposes a Transformer-based model for efficient obstacle detection and tracking using stereo cameras, achieving state-of-the-art performance in both tasks.
Abstract
The content introduces ODTFormer, a model focusing on obstacle detection and tracking using stereo cameras. It addresses the challenges of dynamic environments, such as moving pedestrians, by leveraging deformable attention for cost volume construction. The model optimizes end-to-end for efficiency and accuracy, outperforming existing methods while reducing computational costs significantly. Extensive experiments validate its effectiveness on DrivingStereo and KITTI benchmarks.
I. Introduction
Obstacle detection crucial for autonomous navigation.
Focus on stereo camera-based research.
Previous approaches rely on depth estimation modules.
II. Proposed Method
ODTFormer utilizes deformable attention for cost volume construction.
Introduces novel obstacle tracking method with physical constraints.
Entire model optimized in an end-to-end manner.
III. Experiments
A. Obstacle Detection
ODTFormer outperforms depth-based methods in IoU and CD metrics.
Achieves comparable FPS to StereoVoxelNet with higher accuracy.
B. Obstacle Tracking
Finetuned ODTFormer shows superior performance compared to baseline methods.
Achieves comparable results to RAFT-3D with significantly fewer MAC operations.
IV. Ablation Studies
A. Obstacle Detection
Geometric constraints essential for improved detection accuracy.
Matching cost refinement crucial for optimal performance.
B. Voxel Tracking
Bounded design necessary to prevent memory errors during tracking.
Tracking all voxels leads to better overall tracking accuracy.
Stats
ODTFormer achieves state-of-the-art performance in obstacle detection task.
Model runs at 20fps on RTX A5000 GPU without postprocessing like quantization or pruning.