The paper explores the use of large language models (LLMs) for data preprocessing (DP), focusing on instruction-tuning local LLMs to serve as universal DP task solvers. The Jellyfish dataset is introduced, enabling manual crafting of instructions for DP tasks and enhancing model interpretability. Experiments show that Jellyfish models outperform state-of-the-art methods on seen and unseen tasks, showcasing their competitiveness and generalizability. The impact of tuning with different datasets on DP performance is analyzed, highlighting the importance of multi-task tuning in improving overall performance.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by Haochen Zhan... klokken arxiv.org 03-14-2024
https://arxiv.org/pdf/2312.01678.pdfDypere Spørsmål