The paper presents a novel Attention-based Fusion Router (AFter) for RGBT tracking. Existing RGBT tracking methods often adopt fixed fusion structures to integrate multi-modal features, which struggle to handle diverse challenges in dynamic scenarios.
To address this issue, AFter introduces a Hierarchical Attention Network (HAN) that provides a dynamic fusion structure space. HAN consists of four different attention-based fusion units: spatial enhancement, channel enhancement, and two cross-modal enhancement units. These units are stacked in multiple layers to expand the fusion structure space. Importantly, each fusion unit is embedded with a router to predict the combination weights, allowing AFter to dynamically select the optimal fusion structure for the current scenario.
Extensive experiments on five mainstream RGBT tracking datasets demonstrate the superior performance of AFter compared to state-of-the-art RGBT trackers. The dynamic fusion structure of HAN enables AFter to handle various challenges effectively, outperforming fixed fusion methods. Visualization results further confirm that AFter can dynamically adjust the fusion structure based on the input complexity.
To Another Language
from source content
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Andong Lu,Wa... lúc arxiv.org 05-07-2024
https://arxiv.org/pdf/2405.02717.pdfYêu cầu sâu hơn