The paper presents a novel training approach to improve the interpretability of convolutional neural networks by regularizing the standard gradient to be similar to the guided gradient. The key points are:
Motivation: The standard gradient obtained through backpropagation is often noisy, while the guided gradient preserves sharper details. Regularizing the standard gradient to align with the guided gradient can improve the quality of saliency maps and the overall interpretability of the model.
Methodology: The authors introduce a regularization term in the loss function that encourages the standard gradient with respect to the input image to be similar to the guided gradient. This is achieved by computing both gradients during training and adding a regularization loss that minimizes the difference between them.
Experiments: The authors evaluate their approach using ResNet-18 and MobileNet-V2 models on the CIFAR-100 dataset. They show that their method improves the quality of saliency maps generated by various CAM-based interpretability methods, as measured by faithfulness and causality metrics, while maintaining the classification accuracy.
Ablation studies: The authors analyze the impact of different error functions and the regularization coefficient, finding that the cosine similarity error function and a coefficient of 7.5 × 10^-3 work best.
Overall, the proposed training approach effectively regularizes the standard gradient to be more interpretable, without compromising the model's predictive performance.
翻譯成其他語言
從原文內容
arxiv.org
深入探究