Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation
GOLD, a task-agnostic data generation and knowledge distillation framework, employs an iterative out-of-distribution-guided feedback mechanism to improve the generalizability of distilled small language models.