Основные понятия
TabRepo is a comprehensive dataset for tabular model evaluations, enabling analysis of tuning strategies and ensembling to outperform AutoML systems.
Аннотация
Introduction: Introducing TabRepo, a dataset with predictions and metrics of 1310 models on 200 datasets.
AutoML Evolution: Overview of AutoML methods like Auto-Sklearn, TPOT, H2O AutoML, and AutoGluon.
Benchmarking Tabular Methods: Discussion on benchmarking tabular methods like AMLB.
Introducing TabRepo: Details about the dataset structure and contributions.
Model Bagging: Explanation of training models with bagging for better performance estimation.
Datasets, Predictions, and Evaluations: Description of the datasets used in evaluations.
Ensembling: Methodology for building ensembles using Caruana's approach.
Comparing HPO and AutoML Systems: Analysis of tuning strategies and ensembling effects on model error.
Portfolio Learning with TabRepo: Leveraging transfer learning techniques to outperform current AutoML systems.
Broader Impact Statement: Discussion on the societal impact and ethical considerations of using large datasets for research.
Статистика
TabRepoには1310モデルの786000個のモデル予測が含まれています。
AMLBは1040のタスクを評価し、1つの方法を評価するために40000 CPU時間が必要です。
Цитаты
"Ensembling allows LightGBM to match CatBoost’s accuracy."
"Our work shows that using TabRepo, one can alleviate both caveats by learning default configurations which improves accuracy and latency when matching compute budget."