核心概念
DELIFT is a novel algorithm that significantly reduces data requirements for fine-tuning large language models (LLMs) without compromising performance, achieving comparable or better results than using full datasets by employing a pairwise utility metric and submodular optimization for efficient data selection across different fine-tuning stages.
统计
DELIFT can reduce the fine-tuning data size by up to 70% without compromising performance.
DELIFT outperforms existing data selection techniques by up to 26% in effectiveness.