The paper proposes a novel framework called "Incubator" to generate text classifiers based on user instructions. The key ideas are:
Instruction-Tuning: The authors collect instruction-data pairs from public classification datasets and use in-context learning (ICL) to fine-tune a large language model (LLM) as the "Incubator". This allows the Incubator to generate training data for text classifiers according to user-provided instructions.
Self-Diversification: To address the potential bias and lack of diversity in the generated data, the authors introduce a self-diversification technique. It utilizes a text embedder to identify semantically diverse samples and incorporates them into the instruction-tuning process.
The experiments demonstrate that the Incubator can:
The authors also provide comprehensive analyses on the efficiency, robustness, and scalability of the Incubator framework.
Ke Bahasa Lain
dari konten sumber
arxiv.org
Wawasan Utama Disaring Dari
by Letian Peng,... pada arxiv.org 04-18-2024
https://arxiv.org/pdf/2404.10877.pdfPertanyaan yang Lebih Dalam