核心概念
Logit standardization improves student performance in knowledge distillation by focusing on essential logit relations rather than magnitude matching.
統計
"The standardized student logits have arbitrary magnitude suitable for the student’s capacity while preserving the essential relations learned from the teacher."
"The ratio between the temperatures of student and teacher equals the ratio between the standard deviations of their predicted logits for a well-distilled student."
引用
"Our pre-process enables student to focus on essential logit relations from teacher rather than requiring a magnitude match."
"The standardized student logits have arbitrary magnitude suitable for the student’s capacity while preserving the essential relations learned from the teacher."