核心概念
VastTrack introduces a benchmark with diverse object categories and videos to enhance general object tracking.
摘要
VastTrack is a large-scale benchmark for object tracking, offering 2,115 object categories and 50,610 video sequences. It aims to improve general object tracking by providing rich annotations and diverse scenarios. The benchmark facilitates the development of vision-only and vision-language tracking systems.
Object Categories
VastTrack includes 2,115 object categories, organized hierarchically.
Categories range from animals to accessories, vehicles, instruments, and more.
Each category is verified by experts for suitability in tracking.
Dataset Construction
VastTrack comprises 50,610 video sequences with 4.2 million frames.
Videos are manually labeled with bounding boxes and linguistic descriptions.
The dataset is carefully annotated with multiple rounds of inspection.
Evaluation
25 trackers are evaluated on VastTrack, showing performance drops compared to other benchmarks.
Top-performing trackers leverage Vision Transformer architecture for tracking.
Data Acquisition
Object categories are selected from various sources and organized in a hierarchical structure.
Videos are sourced from YouTube, resulting in a diverse dataset.
Annotation
Videos are annotated with bounding boxes and linguistic descriptions.
Annotations are refined through a multi-step process to ensure accuracy.
Attributes
Test videos in VastTrack are evaluated based on ten attributes like invisibility, deformation, and scale variation.
Attributes influence tracking performance and provide insights into tracker capabilities.
統計資料
VastTrack는 2,115개의 객체 범주와 50,610개의 비디오 시퀀스를 제공합니다.
引述
"VastTrack introduces a benchmark with diverse object categories and videos to enhance general object tracking."
"The dataset is carefully annotated with multiple rounds of inspection."