The paper introduces TINYLLM, a new knowledge distillation approach that addresses two key limitations in existing methods: limited knowledge diversity and lack of rich contextual information.
To solve these issues, TINYLLM employs the following innovations:
In-context Example Generator: This tool generates contextually appropriate examples to help the teacher language models better understand the task and produce more accurate rationales.
Teacher-forcing Chain-of-Thought: TINYLLM integrates the correct answer into the input, enabling the teacher models to generate credible rationales that reflect the true underlying reasoning process.
Multi-teacher Learning: TINYLLM distills knowledge from multiple large teacher language models, allowing the student model to inherit a broader range of skills and knowledge compared to single-teacher approaches.
The authors conduct extensive experiments on six datasets across two reasoning tasks (commonsense and biomedical). The results show that TINYLLM significantly outperforms full fine-tuning (+5.07% to +15.69%), teacher models (+0.82% to +23.40%), and state-of-the-art distillation methods (+10.00% to +11.79%), while using a considerably smaller model size (1.1% to 26.0% of the teacher models).
The paper also includes efficiency analyses, ablation studies, parameter sensitivity tests, and case studies to validate the effectiveness and superiority of the proposed TINYLLM framework.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Yijun Tian,Y... a las arxiv.org 04-02-2024
https://arxiv.org/pdf/2402.04616.pdfConsultas más profundas