Core Concepts
Tree-based models' feature attributions can vary based on the method used, impacting interpretability.
Abstract
The content discusses the interpretation of tree-based machine learning models, focusing on feature attributions. It contrasts popular algorithms like TreeSHAP with marginal Shapley values, highlighting differences in rankings. The internal structure of tree-based models is leveraged to compute feature attributions efficiently. The complexity of computing marginal Shapley values can be reduced, especially for CatBoost models. Various game values and their implications are explored, emphasizing the importance of implementation invariance. The article presents key insights and experiments with XGBoost, LightGBM, and CatBoost libraries.
Introduction
Ensemble methods combine weak learners for strong performance.
Tree-based models are widely used in regulated domains.
Interpretability of ensemble models is crucial due to regulations.
Preliminaries
Features are considered as random variables in game-theoretic explainers.
Shapley and Banzhaf values are key for feature attributions.
Different game values have distinct properties for explaining tree ensembles.
Main Results
TreeSHAP algorithm's implementation invariance is questioned.
Feature attributions can vary based on the model's structure.
Marginal Shapley values offer a simpler and more model-dependent approach.
Stats
TreeSHAP 알고리즘은 경로 의존적 및 개입적 변형을 제공합니다.
CatBoost 모델의 마진 Shapley 값 계산 복잡성이 감소합니다.
Quotes
"TreeSHAP fails to satisfy the desirable property of implementation invariance."
"Marginal Shapley values coincide, whereas TreeSHAP yields different rankings of features."