Temel Kavramlar
Learning the training data distribution can significantly improve the accuracy of small interpretable models, making them competitive with specialized techniques.
Özet
The paper presents a general strategy for building accurate small interpretable models by learning the training data distribution. This strategy, called Compaction by Adaptive Sampling (COAS), iteratively learns the parameters of the training distribution to maximize the accuracy of the model on a held-out validation set.
The authors evaluate this strategy on three tasks:
- Building cluster explanation trees: COAS improves the performance of the traditional CART decision tree algorithm to be competitive with specialized techniques like Iterative Mistake Minimization (IMM) and Expanding Explainable k-Means Clustering (ExShallow).
- Prototype-based classification: COAS boosts the accuracy of the traditional Radial Basis Function Network (RBFN) to be on par with specialized techniques like ProtoNN and Stochastic Neighbor Compression (SNC).
- Classification using Random Forests (RF): COAS significantly improves the performance of standard RF, making it competitive with specialized techniques like Optimal Tree Ensembles (OTE) and subforest-by-prediction.
The authors show that this strategy is general, as it can be applied to different models and notions of model size, and effective, as it can produce results that are competitive with specialized techniques tailored to specific tasks.
İstatistikler
The number of trees in a Random Forest and the maximum depth of the trees are important factors that determine the model size.
The number of prototypes used in prototype-based classification is a key factor that determines the model size.
The number of leaves in a decision tree is a measure of the model size for explainable clustering.
Alıntılar
"Learning the training data distribution can significantly improve the accuracy of small interpretable models, making them competitive with specialized techniques."
"This strategy is general, as it can be applied to different models and notions of model size, and effective, as it can produce results that are competitive with specialized techniques tailored to specific tasks."