The paper proposes a novel framework called "Incubator" to generate text classifiers based on user instructions. The key ideas are:
Instruction-Tuning: The authors collect instruction-data pairs from public classification datasets and use in-context learning (ICL) to fine-tune a large language model (LLM) as the "Incubator". This allows the Incubator to generate training data for text classifiers according to user-provided instructions.
Self-Diversification: To address the potential bias and lack of diversity in the generated data, the authors introduce a self-diversification technique. It utilizes a text embedder to identify semantically diverse samples and incorporates them into the instruction-tuning process.
The experiments demonstrate that the Incubator can:
The authors also provide comprehensive analyses on the efficiency, robustness, and scalability of the Incubator framework.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by Letian Peng,... klokken arxiv.org 04-18-2024
https://arxiv.org/pdf/2404.10877.pdfDypere Spørsmål