Keskeiset käsitteet
TAO-Amodal is a large-scale benchmark for evaluating the ability of object trackers to handle partial and complete occlusions, including objects that are partially out of the camera frame.
Tiivistelmä
The authors introduce TAO-Amodal, a large-scale dataset for evaluating amodal object tracking. The dataset contains 332k bounding boxes covering 833 object categories across 2,907 video sequences, with annotations for both visible (modal) and occluded (amodal) object extents.
The key highlights of the dataset and analysis are:
- TAO-Amodal is significantly larger than prior amodal datasets, covering a much wider range of object categories and occlusion scenarios, including both in-frame and out-of-frame occlusions.
- The authors evaluate state-of-the-art modal object trackers and amodal segmentation methods on TAO-Amodal, finding that they struggle to handle heavy occlusions.
- To address this, the authors explore fine-tuning strategies and data augmentation techniques to adapt modal trackers for amodal tracking. A simple amodal expander module, combined with a synthetic occlusion data augmentation method, leads to notable improvements in detecting and tracking occluded objects.
- The authors also investigate incorporating multi-frame signals, such as Kalman filtering and cross-attended Re-ID features, into the amodal expander to further boost performance on occluded and out-of-frame objects.
- The authors provide a detailed analysis, including an in-depth study on the 'people' category, which is crucial for many real-world applications like autonomous driving.
Overall, TAO-Amodal serves as a comprehensive benchmark to drive progress in amodal object tracking, a critical capability for real-world perception systems.
Tilastot
The dataset contains 332k bounding boxes covering 833 object categories across 2,907 video sequences.
There are 139k boxes with partial occlusion (10-80% visibility) and 35.1k boxes with heavy occlusion (less than 10% visibility).
The dataset also includes 9.6k objects that are partially out of the camera frame.
Lainaukset
"Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants."
"To address the scarcity of amodal benchmarks, we introduce TAO-Amodal, featuring 833 diverse categories in thousands of video sequences."
"We find that existing methods, even when adapted for amodal tracking, struggle to detect and track objects under heavy occlusion."