The content discusses a novel approach for Large Language Models (LLMs) to autonomously identify and select high-quality "cherry data" samples from extensive open-source datasets to enhance instruction tuning performance.
The key highlights are:
The authors introduce a self-guided process that begins with familiarizing the model with a small subset of the dataset during the "Learning from Brief Experience" phase. This lays the groundwork for the subsequent "Evaluating Based on Experience" phase.
In the "Evaluating Based on Experience" phase, the authors introduce the Instruction-Following Difficulty (IFD) score, a metric that evaluates how much the instruction context helps the model generate the corresponding response. The IFD score is used to identify the most impactful training samples.
In the final "Retraining from Self-Guided Experience" phase, the authors use the data with relatively large IFD scores as the "cherry data" to train their final model, resulting in what they call "cherry models".
Extensive experimental results on the Alpaca and WizardLM datasets validate the efficacy of the proposed method. The authors demonstrate that their cherry models outperform the official Alpaca model and the reimplemented WizardLM model, using only 5-10% of the original data.
The authors also provide insights into the distribution and pattern characteristics of the selected cherry data, highlighting its distinct properties compared to the overall dataset.
翻譯成其他語言
從原文內容
arxiv.org
深入探究