Proposing AuG-KD method for effective knowledge transfer in Out-of-Domain Knowledge Distillation.
Multi-modal Knowledge Distillation with Prompt-Tuning enhances recommendation systems by bridging the semantic gap and reducing noise in multi-modal data.
Mutual information maximization enhances CoT distillation for improved reasoning in smaller models.
LumiNet introduces a novel approach to knowledge distillation by enhancing logit-based distillation with the concept of 'perception', addressing overconfidence issues and improving knowledge extraction.
The core message of this paper is to improve the reliability of the teacher's supervision in knowledge distillation by revising the soft labels of the teacher using ground truth, and selecting appropriate training samples to be supervised by the teacher, in order to mitigate the negative impact of incorrect predictions from the teacher model.
The proposed target-aware transformer enables the student model to dynamically aggregate semantic information from the teacher model, allowing the student to mimic the teacher as a whole rather than minimizing each partial divergence in a one-to-one spatial matching fashion.
Bridging the accuracy gap between teacher and student models in knowledge distillation is crucial for effective learning, and using a dynamic teacher with bidirectional mappings effectively achieves this, leading to significant performance improvements in compact student models.
知識蒸留において、教師モデルと生徒モデルの精度ギャップを適切に維持することで、生徒モデルの精度向上を図ることができる。
본 논문에서는 지식 증류 과정에서 교사 모델과 학생 모델 간의 성능 차이를 효과적으로 관리하여 지식 전달 효율성을 향상시키는 새로운 방법인 GPD(Gap Preserving Distillation)를 제안합니다.
Progressive distillation, a technique where a student model learns from intermediate checkpoints of a teacher model, accelerates training by implicitly providing a curriculum of easier-to-learn subtasks, as demonstrated through theoretical analysis and empirical results on sparse parity, probabilistic context-free grammars (PCFGs), and real-world language modeling tasks.