核心概念
The proposed LaDiNE model leverages transformer encoders and conditional diffusion models to achieve improved robustness and reliability in medical image classification, outperforming existing state-of-the-art methods under various covariate shift conditions, including unseen noise, resolution, contrast changes, and adversarial attacks.
摘要
The paper introduces LaDiNE, a novel ensemble learning method for medical image classification that aims to improve robustness and reliability. The key components of LaDiNE are:
- Transformer encoders (TEs) derived from Vision Transformers (ViTs) to extract invariant and informative features from the input images.
- A mapping network that encodes the TE features into a latent representation, which serves as a conditioning signal for the subsequent diffusion model.
- Conditional diffusion models (CDMs) that estimate the predictive distribution in a flexible, functional-form-free manner, without relying on restrictive assumptions.
The proposed method is evaluated on two medical imaging benchmarks - the Tuberculosis chest X-ray dataset and the ISIC Melanoma skin cancer dataset. Extensive experiments are conducted under various covariate shift conditions, including Gaussian noise, low resolution, low contrast, and adversarial attacks.
The results show that LaDiNE consistently outperforms popular baseline methods in terms of classification accuracy and confidence calibration, demonstrating its superior robustness and reliability. Specifically:
- Under clean input conditions, LaDiNE performs on par with the best-performing baseline methods.
- When presented with perturbed inputs, LaDiNE exhibits significantly higher robustness compared to the baselines. It maintains high accuracy even under severe Gaussian noise, low resolution, and low contrast conditions.
- LaDiNE also demonstrates superior resilience against gradient-based adversarial attacks, outperforming other methods across different attack types.
The authors argue that the proposed approach increases the feasibility of deploying reliable medical machine learning models in real clinical settings, where accurate and trustworthy predictions are crucial for patient care and clinical decision support.
统计
"Gaussian noise with ð = 1.00 reduces the accuracy of ResNet-18 and ResNet-50 to 50% on the chest X-ray dataset."
"LaDiNE achieves 98.90% accuracy on the chest X-ray dataset under low-resolution (w = 4.00) conditions, outperforming other methods."
"LaDiNE achieves 93.14% accuracy on the ISIC dataset under low-contrast (r = 0.70) conditions, on par with the best-performing ResNet-50."
引用
"LaDiNE consistently outperforms other methods across almost all perturbations, highlighting its effectiveness and robustness in handling noisy and perturbed images."
"LaDiNE, consistently outperforms other models across both datasets and all attack types, indicating superior robustness to adversarial attacks."