Główne pojęcia
XoFTR, a novel cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images, addresses the challenges of significant texture and intensity differences between the two modalities through masked image modeling pre-training and fine-tuning with pseudo-thermal image augmentation. It also introduces a refined matching pipeline that adjusts for scale discrepancies and enhances match reliability through sub-pixel level refinement.
Streszczenie
The paper introduces XoFTR, a cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images. Unlike visible images, TIR images are robust against adverse light and weather conditions but present difficulties in matching due to significant texture and intensity differences.
To address this, the authors propose a two-stage approach:
-
Masked Image Modeling (MIM) pre-training: The network is pre-trained to reconstruct randomly masked visible-thermal image pairs, allowing it to learn intensity differences in the thermal and visible spectra.
-
Fine-tuning with pseudo-thermal image augmentation: The authors introduce a robust augmentation method to generate pseudo-thermal images from visible images, enabling the network to adapt to modality-induced variations.
Additionally, the authors propose a refined matching pipeline that:
- Adjusts for scale discrepancies by allowing one-to-one and one-to-many matches at 1/8 the original resolution during coarse matching.
- Enhances match reliability through a fine matching module that re-matches coarse-level predictions at 1/2 scale and filters low-confidence matches.
- Refines matches at the sub-pixel level using a regression mechanism to prevent a point in one image from matching with multiple points in the other.
The authors also introduce a new challenging visible-thermal image matching dataset, METU-VisTIR, covering a wide range of viewpoint differences and weather conditions.
Through extensive experiments, the authors demonstrate that XoFTR outperforms strong baselines, achieving state-of-the-art results in visible-thermal image matching and homography estimation tasks.
Statystyki
Thermal images typically have lower resolution and field of view compared to visible images.
Thermal and visible images have significant differences in texture characteristics and nonlinear intensity differences due to distinct radiation mechanisms.
Cytaty
"Unlike visible images, thermal infrared (TIR) images are robust against adverse light and weather conditions such as rain, fog, snow, and night [19, 40]."
"To match TIR-visible images, many hand-crafted [10, 29, 35, 37, 45] and learning-based [1, 8, 15, 17, 55] methods have been proposed. Despite the promising results reported, performances across different viewpoints, scales, and poor textures have been sub-optimal."