The paper explores the use of large language models (LLMs) for data preprocessing (DP), focusing on instruction-tuning local LLMs to serve as universal DP task solvers. The Jellyfish dataset is introduced, enabling manual crafting of instructions for DP tasks and enhancing model interpretability. Experiments show that Jellyfish models outperform state-of-the-art methods on seen and unseen tasks, showcasing their competitiveness and generalizability. The impact of tuning with different datasets on DP performance is analyzed, highlighting the importance of multi-task tuning in improving overall performance.
Para outro idioma
do conteúdo fonte
arxiv.org
Principais Insights Extraídos De
by Haochen Zhan... às arxiv.org 03-14-2024
https://arxiv.org/pdf/2312.01678.pdfPerguntas Mais Profundas