The content introduces a novel tracking paradigm called Zero-shot Generic Multiple Object Tracking (Z-GMOT) that addresses the limitations of existing Multiple Object Tracking (MOT) and Generic Multiple Object Tracking (GMOT) approaches.
The key contributions are:
Introduction of the Referring GMOT dataset, which extends existing GMOT datasets by incorporating detailed textual descriptions of video attributes.
Proposal of iGLIP, an enhanced version of the GLIP vision-language model, to effectively detect objects with specific characteristics without relying on prior training.
Introduction of MA-SORT, a novel tracking algorithm that seamlessly integrates motion and appearance-based matching strategies to handle objects with high visual similarity.
The Z-GMOT framework follows a tracking-by-detection approach. iGLIP is used for the object detection stage, while MA-SORT is employed for the object association stage. Extensive experiments on the Referring GMOT, DanceTrack, and MOT20 datasets demonstrate the effectiveness and generalizability of the proposed Z-GMOT framework in tracking unseen object categories.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by Kim Hoang Tr... klokken arxiv.org 04-16-2024
https://arxiv.org/pdf/2305.17648.pdfDypere Spørsmål