The paper proposes a novel framework called "Incubator" to generate text classifiers based on user instructions. The key ideas are:
Instruction-Tuning: The authors collect instruction-data pairs from public classification datasets and use in-context learning (ICL) to fine-tune a large language model (LLM) as the "Incubator". This allows the Incubator to generate training data for text classifiers according to user-provided instructions.
Self-Diversification: To address the potential bias and lack of diversity in the generated data, the authors introduce a self-diversification technique. It utilizes a text embedder to identify semantically diverse samples and incorporates them into the instruction-tuning process.
The experiments demonstrate that the Incubator can:
The authors also provide comprehensive analyses on the efficiency, robustness, and scalability of the Incubator framework.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Letian Peng,... at arxiv.org 04-18-2024
https://arxiv.org/pdf/2404.10877.pdfDeeper Inquiries