핵심 개념
The proposed Mansformer combines multiple self-attentions, gate, and multi-layer perceptions (MLPs) to efficiently explore and employ more possibilities of self-attention for image deblurring and other restoration tasks.
초록
The paper presents the Mansformer, an efficient Transformer architecture that combines multiple self-attentions, gate, and multi-layer perceptions (MLPs) to address the computational complexity of typical self-attention in high-resolution vision tasks.
Key highlights:
- Designed 4 types of self-attention (local spatial, local channel, global spatial, global channel) with linear computational complexity to capture both local and global dependencies.
- Proposed the gated-dconv MLP (gdMLP) module to condense the two-staged Transformer design into a single stage, outperforming the two-staged architecture with similar model size and computational cost.
- Evaluated the Mansformer on image deblurring, deblurring with JPEG artifacts, deraining, and real image denoising, achieving state-of-the-art performance in terms of both accuracy and efficiency.
The authors first provide an overview of the Mansformer architecture, which follows a multi-scale hierarchical U-Net framework. They then describe the mixed attention mechanism in detail, including the formulations of the 4 types of self-attention. The gated-dconv MLP module is also explained, which replaces the typical feed-forward network (FFN) in Transformers.
Extensive experiments on various image restoration tasks demonstrate the effectiveness and efficiency of the proposed Mansformer compared to existing state-of-the-art methods. The authors also conduct an ablation study to analyze the contributions of different components of the Mansformer.
통계
The paper provides the following key figures and metrics:
FLOPs vs PSNR on the HIDE dataset for deblurring (Fig. 1a)
FLOPs vs PSNR on multiple deraining datasets (Fig. 1b)
PSNR and SSIM results on the GoPro and HIDE datasets for deblurring (Table 1)
PSNR and SSIM results on the REDS-val-300 dataset for deblurring with JPEG artifacts (Table 2)
PSNR and SSIM results on multiple deraining datasets (Table 3)
Ablation study results on the GoPro dataset for deblurring (Table 4)
PSNR and SSIM results on the SIDD dataset for real image denoising (Table 5)