Основні поняття
A simple yet effective tree-based ensemble learning approach can effectively detect whether unseen samples come from a different distribution than the training data.
Анотація
The paper proposes a tree-based out-of-distribution (TOOD) detection method that leverages the interpretable and robust nature of tree-based ensemble models to identify whether testing samples have a similar distribution as the training data.
Key highlights:
- The TOOD detection mechanism computes the pairwise Hamming distance of testing samples' tree embeddings, which are obtained by fitting a tree-based ensemble model on the in-distribution training samples.
- The tree-based approach is interpretable, robust to adversarial attacks, efficient, and flexible to various machine learning tasks.
- Extensive experiments show that TOOD detection outperforms other state-of-the-art out-of-distribution detection methods on tabular, image, and text data.
- The paper provides theoretical analysis to explain the intuition and correctness of the proposed approach.
- The method can be generalized to the unsupervised setting by randomly shuffling the training labels without significantly impacting the out-of-distribution detection performance.
Статистика
"The more in-distribution training samples there are, the larger APHD values we get for both in-distribution and out-of-distribution data."
"As data dimension increases, the expected pairwise Hamming distance also increases."
Цитати
"Inherited from the characteristics of tree-based machine learning models, the main advantages of TOOD detection are the following four aspects: Interpretability, Robustness, Flexibility, and Efficiency."
"Intuitively, in-distribution samples will have larger APHD values since samples from in-distribution are more likely to be separated by the decision boundaries obtained from training if the samples on the opposite sides of each decision boundary have different labels, which results in a larger pairwise Hamming distance."