VastTrack is a novel benchmark that offers a vast array of object categories and videos for comprehensive visual tracking evaluation. It surpasses existing benchmarks in diversity and scale, providing rich annotations for both vision-only and vision-language tracking. The dataset's meticulous manual labeling ensures high-quality annotations, enabling the assessment of 25 representative trackers that show significant performance drops compared to other datasets due to lack of diverse training data.
VastTrack covers an extensive range of object classes from "Ardwolf" to "Azure-Winged Magpie" in alphabetical order. Each class has varying numbers of video sequences, contributing to the dataset's richness and diversity. The benchmark also includes linguistic descriptions alongside bounding box annotations for enhanced tracking capabilities.
The evaluation results on VastTrack demonstrate the challenges faced by existing trackers in handling diverse scenarios such as background clutter, scale variation, deformation, invisibility, motion blur, rotation, low resolution, among others. Despite some trackers leveraging temporal information or advanced architectures like Transformers showing promising results, there is still room for improvement in achieving universal object tracking.
Further experiments reveal that retraining existing trackers on VastTrack leads to performance improvements on both VastTrack itself and other benchmarks like LaSOT. This highlights the effectiveness of VastTrack in enhancing tracking algorithms' capabilities through diverse training data.
إلى لغة أخرى
من محتوى المصدر
arxiv.org
الرؤى الأساسية المستخلصة من
by Liang Peng,J... في arxiv.org 03-07-2024
https://arxiv.org/pdf/2403.03493.pdfاستفسارات أعمق