The paper explores the use of large language models (LLMs) for data preprocessing (DP), focusing on instruction-tuning local LLMs to serve as universal DP task solvers. The Jellyfish dataset is introduced, enabling manual crafting of instructions for DP tasks and enhancing model interpretability. Experiments show that Jellyfish models outperform state-of-the-art methods on seen and unseen tasks, showcasing their competitiveness and generalizability. The impact of tuning with different datasets on DP performance is analyzed, highlighting the importance of multi-task tuning in improving overall performance.
In eine andere Sprache
aus dem Quellinhalt
arxiv.org
Wichtige Erkenntnisse aus
by Haochen Zhan... um arxiv.org 03-14-2024
https://arxiv.org/pdf/2312.01678.pdfTiefere Fragen