Core Concepts
Two-stream RGB-T tracking architecture transformed into a one-stream model through mutual prompt learning and knowledge distillation for improved efficiency.
Abstract
The content discusses the evolution of RGB-T tracking from two-stream to one-stream architecture. It introduces the concept of mutual prompt learning and knowledge distillation to enhance fusion between visible light and thermal infrared modalities, improving precision rates and inference speed. The article details challenges faced by current RGB-T models, proposes a novel architecture, and presents experimental results showcasing the effectiveness of the proposed method.
Abstract:
Fusion of visible light and thermal images in RGB-T tracking.
Novel two-stream to one-stream transformation via mutual prompt learning.
Improved precision rate and faster inference speed demonstrated in experiments.
Introduction:
Importance of RGB-T tracking with visible light and thermal imaging.
Challenges in existing models due to annotation costs and computational burden.
Proposal for a new one-stream architecture for efficient feature extraction.
Method:
Design of teacher model based on OSTrack extended with Siamese architecture.
Introduction of Multi-Modal Mutual Prompter for adaptive modality identification.
Hierarchical knowledge distillation strategy from teacher to student model.
Experiments:
Training details including loss functions used.
Evaluation on public datasets GTOT, RGBT234, LasHeR, VTUAV-ST, VTUAV-LT.
Comparison with state-of-the-art methods showing superior performance.
Data Extraction:
"Teacher model achieved a precision rate of 92.6%."
"Student model realized an inference speed more than three times faster than the teacher model."
Stats
Teacherモデルは92.6%の精度率を達成しました。
Studentモデルは、Teacherモデルよりも3倍以上高速な推論速度を実現しました。
Quotes
"Our designed teacher model achieved a precision rate of 92.6%."
"Our trained student model even slightly outperformed the teacher model."