Large language models (LLMs) demonstrate significant potential in revolutionizing data preprocessing tasks for data mining and analytics, exhibiting high accuracy in error detection, data imputation, schema matching, and entity matching, but require further development to overcome limitations in domain specificity, computational expense, and occasional factual inaccuracies.
Instruction-tuned LLMs like Jellyfish enhance DP performance and generalizability.