Core Concepts
Providing theoretical guarantees on the robustness of counterfactuals to potential model changes.
Abstract
The content discusses the importance of generating robust counterfactual explanations for neural networks. It introduces the concept of naturally-occurring model change and proposes a measure called Stability to quantify the robustness of counterfactuals. The article highlights the significance of ensuring that counterfactual explanations remain valid after model updates to maintain trust in algorithmic decision-making. Experimental results demonstrate the effectiveness of the proposed algorithms, T-Rex:I and T-Rex:NN, in generating robust counterfactuals with high validity and realistic outcomes.
Stats
∥Params(M) − Params(m)∥<∆.
γx is the local Lipschitz constant.
E [M(X)|X = x] = E [M(x)] = m(x).
Var [M(X)|X = x] = Var [M(x)] = νx.
Rk,σ2(x, m) = 1/k Σ (m(xi) − γ||x - xi||).
ˆRk,σ2(x, m) = 1/k Σ (m(xi) − |m(x) - m(xi)|).
Quotes
"There is an emerging interest in generating robust counterfactual explanations that would remain valid if the model is updated or changed even slightly."
"Our main contribution is to show that counterfactuals with sufficiently high value of Stability as defined by our measure will remain valid after potential 'naturally-occurring' model changes with high probability."