Core Concepts
By providing a tracker that follows Siamese paradigm with precise and updated historical information, a significant performance improvement can be achieved without the need to retrain the entire model.
Abstract
The paper proposes a novel visual tracking method called HIPTrack that leverages a historical prompt network to effectively utilize historical information of the target. The historical prompt network consists of an encoder and a decoder:
The historical prompt encoder encodes the position information and visual features of the target from the current frame as historical target feature, which is then appended to a memory bank.
The historical prompt decoder retrieves the historical target features from the memory bank and adaptively aggregates them with the current search region feature to generate a historical prompt, which is then concatenated with the search region feature and fed into the prediction head.
Experiments on multiple benchmarks demonstrate that HIPTrack outperforms current state-of-the-art trackers, especially in handling complex scenarios such as occlusion, deformation and scale variation. The historical prompt network can also be seamlessly integrated as a plug-and-play module to enhance the performance of existing trackers.
Stats
The paper does not provide any specific numerical data or statistics to support the key logics.