Core Concepts
A self-supervised transformer-based approach, termed LTFormer, generates robust feature descriptors to effectively match keypoints between visible and near-infrared remote sensing images.
Abstract
The paper proposes a novel self-supervised matching network, LTFormer, to address the challenge of matching heterogeneous remote sensing images, particularly visible and near-infrared (NIR) image pairs. The key aspects of the approach are:
Feature Point Detection: The framework utilizes a feature point detector, such as SIFT, to generate keypoints on the visible and NIR images.
Patch Extraction: Image patches are extracted around the detected feature points from both the visible and NIR images.
Deep Feature Descriptor Generation: A lightweight transformer-based network, LTFormer, is designed to generate deep-level feature descriptors for the extracted patches. This helps capture robust and discriminative representations.
Self-supervised Training: A self-supervised training approach is adopted, where triplet patches (anchor, positive, negative) are formed using homography transformations. This allows the model to learn feature representations without the need for annotated data.
LT Loss Function: An innovative triplet loss function, LT Loss, is introduced to enhance the matching performance by increasing the distance between negative samples and decreasing the distance between positive samples.
The proposed LTFormer framework outperforms traditional handcrafted feature descriptors and recent deep learning-based methods in matching visible and NIR remote sensing image pairs, even in the absence of annotated data. The authors demonstrate the effectiveness, robustness, and efficiency of the LTFormer approach through extensive experiments and ablation studies.
Stats
The WHU-OPT-SAR dataset is used, which consists of 100 optical images with 4 channels (R, G, B, NIR) and a resolution of 5 meters.
The first 80 images are used for training, and the remaining 20 are used for validation.
Quotes
"Matching visible and near-infrared (NIR) images remains a significant challenge in remote sensing image fusion."
"To address this challenge, this paper proposes a novel keypoint descriptor approach that obtains robust feature descriptors via a self-supervised matching network."
"Our approach outperforms conventional hand-crafted local feature descriptors and proves equally competitive compared to state-of-the-art deep learning-based methods, even amidst the shortage of annotated data."