The paper presents a novel method that explicitly masks background patches in the attention mechanism of Vision Transformers (ViTs) to improve model interpretability in computational pathology.
The key highlights are:
Whole-slide images (WSIs) in digital pathology contain both informative tissue regions and non-informative background areas. Including background patches in the ViT's attention mechanism can introduce artefacts and compromise model interpretability.
The proposed approach leverages fine-grained tissue segmentation masks to nullify the contribution of entirely background patches during self-attention. This ensures that region-level representations are exclusively derived from patches containing tissue.
Experiments on the PANDA dataset for prostate cancer grading show that the masked attention method achieves comparable performance to plain self-attention while providing more accurate and clinically meaningful attention heatmaps.
The method has the potential to enhance the accuracy, robustness, and interpretability of ViT-based models in digital pathology, ultimately contributing to improved diagnostic accuracy.
לשפה אחרת
מתוכן המקור
arxiv.org
שאלות מעמיקות