Proposing AuG-KD method for effective knowledge transfer in Out-of-Domain Knowledge Distillation.
Multi-modal Knowledge Distillation with Prompt-Tuning enhances recommendation systems by bridging the semantic gap and reducing noise in multi-modal data.
Mutual information maximization enhances CoT distillation for improved reasoning in smaller models.
LumiNet introduces a novel approach to knowledge distillation by enhancing logit-based distillation with the concept of 'perception', addressing overconfidence issues and improving knowledge extraction.
The core message of this paper is to improve the reliability of the teacher's supervision in knowledge distillation by revising the soft labels of the teacher using ground truth, and selecting appropriate training samples to be supervised by the teacher, in order to mitigate the negative impact of incorrect predictions from the teacher model.
The proposed target-aware transformer enables the student model to dynamically aggregate semantic information from the teacher model, allowing the student to mimic the teacher as a whole rather than minimizing each partial divergence in a one-to-one spatial matching fashion.