Core Concepts
Pruned layer-wise relevance propagation (PLRP) generates sparser and more interpretable explanations for deep neural network predictions by directly pruning the relevance propagation in each layer, while maintaining the conservation property of layer-wise relevance propagation (LRP).
Abstract
The paper introduces a modification of the layer-wise relevance propagation (LRP) method, called pruned layer-wise relevance propagation (PLRP), to generate sparser and more interpretable explanations for deep neural network (DNN) predictions.
Key highlights:
PLRP prunes the relevance propagation in each layer by reducing the relevance scores below a certain threshold, determined either by a fixed proportion or by a sparsity gain criterion.
The pruned relevance is then redistributed among the remaining neurons to maintain the relevance conservation property of LRP.
Two variants of PLRP are proposed: PLRP-λ, which rescales the remaining relevance scores, and PLRP-M, which modifies the relevance propagation matrix.
Evaluation on image classification (ImageNet, ECSSD) and genomic sequence classification tasks shows that PLRP generates sparser explanations with higher localization of relevance on the most important features, compared to the LRP baseline.
The sparsity gain is achieved with only a slight decrease in faithfulness, as the pruning mainly affects the less important features.
PLRP-λ generally outperforms PLRP-M in terms of sparsity, localization, and robustness.
The sparser explanations generated by PLRP can help to better identify and interpret the most important features for the model's predictions, especially for high-dimensional data like genomic sequences.
Stats
The prediction score fc*(x) drops similarly steeply as for the LRP baseline for the features with the highest relevance that are perturbed first.
The difference in AUC for faithfulness is rather driven by the less important features that are perturbed later.
Quotes
"Sparsification of the explanation might be desirable in the sense that it reduces noise and the number of features with non-zero relevance, i.e., highlights only the most important features."
"Instead of global explanations of the model, our focus is on local methods. Their general idea is to obtain an input-specific explanation of the decisive behavior of the model by attributing relevance scores to every input dimension based on the model's prediction."