Enhancing Visible-Thermal Image Matching with Cross-modal Feature Matching Transformer (XoFTR)
XoFTR, a novel cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images, addresses the challenges of significant texture and intensity differences between the two modalities through masked image modeling pre-training and fine-tuning with pseudo-thermal image augmentation. It also introduces a refined matching pipeline that adjusts for scale discrepancies and enhances match reliability through sub-pixel level refinement.