The paper presents a novel approach called ScoreCAM++ to improve the interpretability of deep learning models, particularly Convolutional Neural Networks (CNNs). The key highlights are:
The authors identify limitations in the existing ScoreCAM method and propose modifications to enhance the visual explanations generated by the model.
The core innovation lies in altering the normalization function within the activation layer utilized in ScoreCAM. The authors employ the tanh activation function to amplify the contrast between high and low-priority regions in the activation layer.
Additionally, the authors apply the tanh activation to the upsampled activation layers before multiplying them with the input image. This helps in selectively gating the lower-priority values within the activation layer, thereby directing the attention towards the most relevant features.
Extensive experiments are conducted on the Cats and Dogs dataset and the ImageNet dataset using the ResNet-18 and VGG-19 architectures. The proposed ScoreCAM++ consistently outperforms the existing state-of-the-art methods across various evaluation metrics, including Average Drop Percentage, Increase in Confidence, and Win Percentage.
Qualitative analysis through visualizations demonstrates that ScoreCAM++ provides more reliable and intuitive explanations compared to the baseline methods, highlighting the most salient regions in the input images.
The authors also perform ablation studies to validate the importance of the proposed components, such as the choice of activation function and the scaling of upsampled activation layers.
Overall, the paper presents a simple yet highly effective approach to enhance the interpretability of deep learning models, making significant contributions to the field of Explainable AI.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Soham Mitra,... a las arxiv.org 05-01-2024
https://arxiv.org/pdf/2404.19341.pdfConsultas más profundas