Core Concepts
Transformer-based CFAT enhances image super-resolution with triangular windows, improving performance and reducing distortion.
Abstract
Abstract:
Transformer models revolutionize image super-resolution by using complex contextual features.
Proposed CFAT combines triangular and rectangular windows for better attention mechanisms.
Introduction:
Efficient image compression is lossy, leading to the need for super-resolution techniques.
CNN-based models enhanced SR capabilities but had limitations in feature extraction.
Related Works:
CNN-based SR models like EDSR and SAN improved feature extraction and attention mechanisms.
Vision Transformer (ViT) based models like SwinIR and ART leverage long-range dependencies effectively.
Proposed Method:
CFAT architecture includes head, body, and tail modules for feature extraction and reconstruction.
Triangular window technique improves shifting modes and reduces boundary-level distortion.
Experiments:
Extensive experiments on benchmark datasets show superior performance of CFAT over SOTA models.
Ablation Study:
Varying hyperparameters like window size, shift size, interval size, etc., impact model performance significantly.
Comparisons:
Quantitative comparison shows CFAT outperforms other state-of-the-art methods in terms of PSNR and SSIM.
Conclusion:
CFAT with triangular windows offers a novel approach to image super-resolution with improved performance.
Stats
Our proposed model shows a significant 0.7 dB performance improvement over other state-of-the-art SR architectures.