The content introduces a novel tracking paradigm called Zero-shot Generic Multiple Object Tracking (Z-GMOT) that addresses the limitations of existing Multiple Object Tracking (MOT) and Generic Multiple Object Tracking (GMOT) approaches.
The key contributions are:
Introduction of the Referring GMOT dataset, which extends existing GMOT datasets by incorporating detailed textual descriptions of video attributes.
Proposal of iGLIP, an enhanced version of the GLIP vision-language model, to effectively detect objects with specific characteristics without relying on prior training.
Introduction of MA-SORT, a novel tracking algorithm that seamlessly integrates motion and appearance-based matching strategies to handle objects with high visual similarity.
The Z-GMOT framework follows a tracking-by-detection approach. iGLIP is used for the object detection stage, while MA-SORT is employed for the object association stage. Extensive experiments on the Referring GMOT, DanceTrack, and MOT20 datasets demonstrate the effectiveness and generalizability of the proposed Z-GMOT framework in tracking unseen object categories.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Kim Hoang Tr... lúc arxiv.org 04-16-2024
https://arxiv.org/pdf/2305.17648.pdfYêu cầu sâu hơn