Core Concepts
Efficient RGB-T tracking achieved through mutual prompt learning and knowledge distillation.
Abstract
The content discusses the transition from a two-stream to a one-stream RGB-T tracking architecture. It introduces a novel approach based on cross-modal mutual prompt learning, leading to improved precision rates and faster inference speeds. The article highlights challenges in RGB-T tracking, model architectures, data synthesis issues, and the proposed solution of a one-stream architecture guided by a teacher model. Extensive experiments demonstrate the effectiveness of the proposed method compared to existing RGB-T trackers.
Abstract:
Focus on efficient RGB-T tracking.
Novel approach based on mutual prompt learning.
Improved precision rates and faster inference speeds.
Introduction:
Importance of thermal infrared imaging in object tracking.
Evolution from anchor-based to Transformer-based solutions.
Challenges in RGB-T tracking addressed by new architectures.
Method:
Design of teacher model for feature extraction and fusion.
Multi-modal mutual prompter for adaptive modality identification.
Hierarchical knowledge distillation for student model training.
Experiments:
Evaluation on public datasets GTOT, RGBT234, LasHeR, VTUAV-ST, VTUAV-LT.
Comparison with state-of-the-art methods showcasing superior performance.
Attribute-based evaluation demonstrating effectiveness under challenging conditions.
Stats
"Method Backbone Pub GTOT 92.6 77.5
RGBT234 88.3 66.1
LasHeR 71.4 56.7"
Quotes
"Our designed teacher model achieved the highest precision rate."
"The student model realized an inference speed more than three times faster than the teacher model."