Belangrijkste concepten
DAE-Fuse, a novel two-phase discriminative autoencoder framework, generates sharp and natural fused images by introducing adversarial feature extraction and attention-guided cross-modality fusion.
Samenvatting
The paper proposes a novel two-phase discriminative autoencoder framework, termed DAE-Fuse, for multi-modality image fusion.
In the first phase, adversarial feature extraction:
- The model employs shallow and deep encoders to extract multi-level features, differentiating high and low-frequency information.
- Two discriminative blocks are introduced to provide an additional adversarial loss, guiding the feature extraction by reconstructing the source images.
In the second phase, attention-guided cross-modality fusion:
- A cross-attention module is used to naturally combine the feature embeddings from different modalities before fusion.
- The discriminative blocks are adapted to distinguish the structural differences between the fused output and source inputs, injecting more naturalness into the results.
Extensive experiments on public infrared-visible, medical image fusion, and downstream object detection datasets demonstrate the superiority and generalizability of DAE-Fuse in both quantitative and qualitative evaluations.
Statistieken
Infrared images effectively capture thermal targets in dark environments but lack texture details.
Visible images maintain most of the textual details but are sensitive to light conditions.
Multi-modality image fusion aims to combine the advantages of both infrared and visible images.
Citaten
"GAN-based models use adversarial learning with zero-sum games in a fused image and source images to fuse two inputs."
"AE-based methods tend to effectively extract both global and local features from different modalities."