Leveraging Pretrained Masked Autoencoders for Comprehensive Feature Extraction and Guided Training in Infrared and Visible Image Fusion
The proposed MaeFuse framework leverages a pretrained Masked Autoencoder (MAE) encoder to effectively extract both low-level and high-level visual features for infrared and visible image fusion. A guided training strategy is introduced to align the fusion layer's feature domain with the encoder's feature space, enabling seamless integration of complementary information from both modalities.