Generating counterfactual explanations with an explicit cardinality constraint to provide more interpretable and easily understandable explanations for machine learning model predictions.
Gradient-based white-box attribution methods produce noisy explanations due to high-frequency artifacts inherited from the gradient signal. Applying a low-pass filter tailored to the network architecture and attribution method, a method called FORGrad, consistently enhances the faithfulness of white-box methods, allowing them to rival computationally intensive black-box approaches.