The content introduces a novel tracking paradigm called Zero-shot Generic Multiple Object Tracking (Z-GMOT) that addresses the limitations of existing Multiple Object Tracking (MOT) and Generic Multiple Object Tracking (GMOT) approaches.
The key contributions are:
Introduction of the Referring GMOT dataset, which extends existing GMOT datasets by incorporating detailed textual descriptions of video attributes.
Proposal of iGLIP, an enhanced version of the GLIP vision-language model, to effectively detect objects with specific characteristics without relying on prior training.
Introduction of MA-SORT, a novel tracking algorithm that seamlessly integrates motion and appearance-based matching strategies to handle objects with high visual similarity.
The Z-GMOT framework follows a tracking-by-detection approach. iGLIP is used for the object detection stage, while MA-SORT is employed for the object association stage. Extensive experiments on the Referring GMOT, DanceTrack, and MOT20 datasets demonstrate the effectiveness and generalizability of the proposed Z-GMOT framework in tracking unseen object categories.
לשפה אחרת
מתוכן המקור
arxiv.org
תובנות מפתח מזוקקות מ:
by Kim Hoang Tr... ב- arxiv.org 04-16-2024
https://arxiv.org/pdf/2305.17648.pdfשאלות מעמיקות