Grunnleggende konsepter
OCTree is a novel framework that leverages large language models (LLMs) and decision tree reasoning to automate the generation of effective features for tabular data, improving the performance of various prediction models.
Statistikk
Using OCTree with Llama 2 for XGBoost on the Tesla Stock dataset reduced the relative error by 15.9%.
With GPT-4o, OCTree achieved a relative error reduction of 17.1% on the Tesla Stock dataset for XGBoost.
OCTree outperforms CAAFE with GPT-4o, even when using a custom Llama 2 model fine-tuned on open dialogue data.
On datasets without language descriptions, OCTree reduces relative prediction errors by an average of 5.0% compared to the baseline XGBoost model on 19 classification tasks.
Combining OCTree with OpenFE further boosts performance, achieving a 7.9% reduction in relative error for XGBoost.
Features generated using XGBoost with OCTree can be transferred to improve the performance of MLP and HyperFast models.