Concepts de base
A method leveraging robust optimization techniques to generate counterfactual explanations that are provably robust to model parameter changes and plausible with respect to the training data distribution.
Résumé
The content discusses the problem of generating counterfactual explanations (CEs) for neural network classifiers that are both robust to model parameter changes and plausible with respect to the training data distribution.
The key highlights are:
Counterfactual explanations (CEs) are modified inputs to a classifier that are classified differently than the original input. CEs should have desirable properties such as validity, proximity, and plausibility.
Existing methods that target robustness to model parameter changes do not simultaneously optimize for proximity and plausibility, limiting their practical applicability.
The authors propose PROPLACE, a method that leverages robust optimization techniques to generate CEs that are provably robust and plausible.
PROPLACE formulates the problem as a bi-level optimization, with an outer minimization to optimize proximity and an inner maximization to certify robustness.
The authors provide formal guarantees of soundness and completeness for their method, and prove its convergence.
Experiments on benchmark datasets show that PROPLACE achieves state-of-the-art performance in terms of robustness and plausibility, while maintaining competitive proximity.
Stats
The content does not contain any explicit numerical data or statistics. It focuses on the methodological aspects of the proposed PROPLACE approach.