Core Concepts
Proposing a novel Frequency-Aware Transformer (FAT) block for learned image compression to capture multiscale and directional frequency components efficiently.
Abstract
The paper introduces the Frequency-Aware Transformer (FAT) block for learned image compression, addressing the limitations of existing methods in capturing anisotropic frequency components and preserving directional details. The FAT block comprises frequency-decomposition window attention (FDWA) modules and a frequency-modulation feed-forward network (FMFFN) to improve rate-distortion performance. Additionally, a transformer-based channel-wise autoregressive (T-CA) model is presented to exploit channel dependencies effectively. Experimental results show superior rate-distortion performance compared to existing methods.
1. Introduction
- Learned image compression models have emerged as effective solutions.
- Existing models use CNNs but have limitations in capturing anisotropic frequency components.
- Transformers are introduced to capture non-local spatial relationships for better R-D performance.
2. Methods
- Proposed a Frequency-Aware Transformer (FAT) block with FDWA and FMFFN.
- Introduced T-CA entropy model for modeling dependencies across frequency components.
3. Experiments
- Achieved state-of-the-art R-D performance on Kodak, Tecnick, and CLIC datasets.
- Outperformed VTM-12.1 by significant margins in BD-rate.
Stats
Experiments show outperformance of latest standardized codec VTM-12.1 by 14.5%, 15.1%, 13.0% in BD-rate on Kodak, Tecnick, and CLIC datasets.
Quotes
"Our method achieves state-of-the-art rate-distortion performance."
"Evidently outperforms latest standardized codec VTM-12.1."