The paper presents a novel method that explicitly masks background patches in the attention mechanism of Vision Transformers (ViTs) to improve model interpretability in computational pathology.
The key highlights are:
Whole-slide images (WSIs) in digital pathology contain both informative tissue regions and non-informative background areas. Including background patches in the ViT's attention mechanism can introduce artefacts and compromise model interpretability.
The proposed approach leverages fine-grained tissue segmentation masks to nullify the contribution of entirely background patches during self-attention. This ensures that region-level representations are exclusively derived from patches containing tissue.
Experiments on the PANDA dataset for prostate cancer grading show that the masked attention method achieves comparable performance to plain self-attention while providing more accurate and clinically meaningful attention heatmaps.
The method has the potential to enhance the accuracy, robustness, and interpretability of ViT-based models in digital pathology, ultimately contributing to improved diagnostic accuracy.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Clém... lúc arxiv.org 04-30-2024
https://arxiv.org/pdf/2404.18152.pdfYêu cầu sâu hơn