Kernkonzepte
Evaluation of the trustworthiness of pretrained transformer models, Swin UNETR and SMIT, for lung cancer segmentation.
Zusammenfassung
The content evaluates the trustworthiness of two pretrained transformer models, Swin UNETR and SMIT, for lung cancer segmentation. It includes an abstract, introduction, experimental methods, results analysis, discussion, conclusions, ethical standards compliance, acknowledgments, and references. The study assesses accuracy on in-distribution data, robustness on out-of-distribution data like COVID-19 CT scans and MRI scans of ovarian and prostate cancers, as well as zero-shot generalization to T2-weighted MRIs. Key metrics include Dice similarity coefficient (DSC), relative volume difference (RVD), precision (Pr), recall (Rc), area under the receiver operator curve (AUROC), false positive rate at 95% (FPR@95). The study highlights the importance of evaluating technical trustworthiness beyond segmentation accuracy alone.
Abstract:
- Assessed trustworthiness of Swin UNETR and SMIT for lung tumor segmentation.
- High accuracy on in-distribution data.
- Robustness on out-of-distribution data demonstrated.
- Zero-shot generalization to T2-weighted MRIs evaluated.
Introduction:
- Vision transformers with self-supervised pre-training are crucial for accurate medical image segmentation.
- Trustworthiness is essential in healthcare settings.
- OOD robustness is vital due to different imaging fields and new sequences.
Experimental Methods:
- Evaluated Swin UNETR and SMIT for lung cancer segmentation.
- Used DSC, RVD, Pr, Rc metrics for evaluation.
- Analyzed ID datasets along with near-OOD and far-OOD performance.
Results Analysis:
- SMIT showed higher precision than Swin UNETR in tumor segmentation.
- Both models performed similarly on ID datasets but differed in OOD robustness.
- SMIT outperformed Swin UNETR in zero-shot generalization to T2-weighted MRIs.
Discussion:
- SMIT exhibited higher OOD robustness due to local and global image token prediction.
- Importance of including other metrics besides accuracy for model evaluation emphasized.
Conclusions:
- Evaluation highlighted differences between Swin UNETR and SMIT in trustworthiness aspects.
- Rigorous evaluation standards are crucial for safe clinical deployment.
Statistiken
We measured segmentation accuracy on two public 3D CT datasets with Dice 0.80 for SMIT and 0.78 for Swin UNETR.
SMIT showed significantly better far-out-of-distribution accuracy on CT (AUROC 97.2% vs. 87.1%) and MRI (92.15% vs. 73.8%).
SMIT outperformed Swin UNETR in zero-shot segmentation on MRI with Dice 0.78 vs. 0.69.
Zitate
"Both models demonstrated high accuracy on in-distribution data."
"We believe higher OOD robustness occurred due to the combination of local and global image token prediction within a self-distillation network in SMIT."