The paper introduces the novel concept of "explanatory multiverse" to address the limitations of current counterfactual explanation approaches. Counterfactual explanations are popular for interpreting decisions of opaque machine learning models, but existing methods treat each counterfactual path independently, neglecting the spatial relations between them.
The authors formalize explanatory multiverse, which encompasses all possible counterfactual journeys and their geometric properties, such as affinity, branching, divergence, and convergence. They propose two methods to navigate and reason about this multiverse:
Vector space interpretation: Counterfactual paths are represented as normalized vectors, allowing comparison of journeys of varying lengths. Branching points are identified, and directional differences between paths are computed.
Directed graph interpretation: Counterfactual paths are modeled as a directed graph, where vertices represent data points and edges capture feature changes. This approach accounts for feature monotonicity and allows quantifying branching factors and loss of opportunity.
The key benefits of explanatory multiverse include:
The authors demonstrate the capabilities of their approaches on synthetic and real-world data sets, and discuss future research directions, such as incorporating complex dynamics and explanation representativeness.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問