The paper presents a novel method that explicitly masks background patches in the attention mechanism of Vision Transformers (ViTs) to improve model interpretability in computational pathology.
The key highlights are:
Whole-slide images (WSIs) in digital pathology contain both informative tissue regions and non-informative background areas. Including background patches in the ViT's attention mechanism can introduce artefacts and compromise model interpretability.
The proposed approach leverages fine-grained tissue segmentation masks to nullify the contribution of entirely background patches during self-attention. This ensures that region-level representations are exclusively derived from patches containing tissue.
Experiments on the PANDA dataset for prostate cancer grading show that the masked attention method achieves comparable performance to plain self-attention while providing more accurate and clinically meaningful attention heatmaps.
The method has the potential to enhance the accuracy, robustness, and interpretability of ViT-based models in digital pathology, ultimately contributing to improved diagnostic accuracy.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by Clém... klokken arxiv.org 04-30-2024
https://arxiv.org/pdf/2404.18152.pdfDypere Spørsmål