本稿では、RGB画像から深度情報を推定することで、オクルージョンやモーションブラーに強いロバストな物体追跡システムを実現する新しい手法を提案する。
Integrating depth information through a novel depth attention mechanism significantly enhances the robustness of RGB-based visual object tracking, particularly in challenging scenarios like occlusions and motion blur, without requiring RGB-D cameras.
Proposing a multi-attention associate prediction network for visual tracking to improve feature matching and decision alignment, achieving leading performance on various benchmarks.
Proposing AQATrack for adaptive tracking using autoregressive queries to capture spatio-temporal information effectively.