Multi-Stage Balanced Distillation for Effective Knowledge Distillation from Large Language Models Under Long-Tailed Data Distributions
The BalDistill framework effectively transfers knowledge from large language models to smaller student models under long-tailed data distributions by strategically balancing training data through active example selection and synthetic data generation.