核心概念
WiTUnet, a novel encoder-decoder architecture, effectively integrates the global perception capabilities of Transformers and the local detail sensitivity of CNNs to significantly enhance low-dose CT image denoising performance, outperforming state-of-the-art methods.
摘要
The paper introduces WiTUnet, a novel encoder-decoder architecture that combines the strengths of Convolutional Neural Networks (CNNs) and Transformers to address the challenges of low-dose computed tomography (LDCT) image denoising.
Key highlights:
- The U-shaped WiTUnet architecture features a series of nested dense skip pathways to efficiently integrate high-resolution encoder features with semantically rich decoder features, enhancing information alignment.
- To capture non-local information while reducing computational complexity, WiTUnet incorporates a non-overlapping Window Transformer (WT) block, which includes a windowed multi-head self-attention (W-MSA) mechanism.
- To improve the sensitivity to local information within the Transformer module, WiTUnet introduces a new CNN-based Local Image Perspective Enhancement (LiPe) block, replacing the traditional MLP.
- Extensive experiments on the NIH-AAPM-Mayo Clinic LDCT dataset demonstrate that WiTUnet significantly outperforms state-of-the-art denoising methods in terms of PSNR, SSIM, and RMSE, effectively reducing noise while preserving image details.
統計資料
The LDCT images have a pixel dimension of 512 x 512.
The dataset consists of full-dose CT (FDCT) and quarter-dose (simulated) LDCT image pairs from 10 anonymized patients, with data from patient L506 used for evaluation and the remaining 9 patients' data used for training.
引述
"WiTUnet, a novel encoder-decoder architecture, effectively integrates the global perception capabilities of Transformers and the local detail sensitivity of CNNs to significantly enhance low-dose CT image denoising performance, outperforming state-of-the-art methods."
"To capture non-local information while reducing computational complexity, WiTUnet incorporates a non-overlapping Window Transformer (WT) block, which includes a windowed multi-head self-attention (W-MSA) mechanism."
"To improve the sensitivity to local information within the Transformer module, WiTUnet introduces a new CNN-based Local Image Perspective Enhancement (LiPe) block, replacing the traditional MLP."