The paper proposes a novel framework called "Incubator" to generate text classifiers based on user instructions. The key ideas are:
Instruction-Tuning: The authors collect instruction-data pairs from public classification datasets and use in-context learning (ICL) to fine-tune a large language model (LLM) as the "Incubator". This allows the Incubator to generate training data for text classifiers according to user-provided instructions.
Self-Diversification: To address the potential bias and lack of diversity in the generated data, the authors introduce a self-diversification technique. It utilizes a text embedder to identify semantically diverse samples and incorporates them into the instruction-tuning process.
The experiments demonstrate that the Incubator can:
The authors also provide comprehensive analyses on the efficiency, robustness, and scalability of the Incubator framework.
Para outro idioma
do conteúdo fonte
arxiv.org
Principais Insights Extraídos De
by Letian Peng,... às arxiv.org 04-18-2024
https://arxiv.org/pdf/2404.10877.pdfPerguntas Mais Profundas