The paper introduces TINYLLM, a new knowledge distillation approach that addresses two key limitations in existing methods: limited knowledge diversity and lack of rich contextual information.
To solve these issues, TINYLLM employs the following innovations:
In-context Example Generator: This tool generates contextually appropriate examples to help the teacher language models better understand the task and produce more accurate rationales.
Teacher-forcing Chain-of-Thought: TINYLLM integrates the correct answer into the input, enabling the teacher models to generate credible rationales that reflect the true underlying reasoning process.
Multi-teacher Learning: TINYLLM distills knowledge from multiple large teacher language models, allowing the student model to inherit a broader range of skills and knowledge compared to single-teacher approaches.
The authors conduct extensive experiments on six datasets across two reasoning tasks (commonsense and biomedical). The results show that TINYLLM significantly outperforms full fine-tuning (+5.07% to +15.69%), teacher models (+0.82% to +23.40%), and state-of-the-art distillation methods (+10.00% to +11.79%), while using a considerably smaller model size (1.1% to 26.0% of the teacher models).
The paper also includes efficiency analyses, ablation studies, parameter sensitivity tests, and case studies to validate the effectiveness and superiority of the proposed TINYLLM framework.
На другой язык
из исходного контента
arxiv.org
Дополнительные вопросы