핵심 개념
Logit standardization improves student performance in knowledge distillation by focusing on essential logit relations rather than magnitude matching.
초록
Knowledge distillation transfers soft labels from teacher to student using shared temperature-based softmax.
Logit standardization addresses the issue of mandatory exact match between teacher and student logits.
Z-score pre-process enables student to learn essential logit relations from teacher without magnitude match.
Extensive evaluation on CIFAR-100 and ImageNet shows significant performance improvement.
Proposed method outperforms state-of-the-art methods in knowledge distillation.
Logit standardization pre-process released on Github.
통계
"The standardized student logits have arbitrary magnitude suitable for the student’s capacity while preserving the essential relations learned from the teacher."
"The ratio between the temperatures of student and teacher equals the ratio between the standard deviations of their predicted logits for a well-distilled student."
인용구
"Our pre-process enables student to focus on essential logit relations from teacher rather than requiring a magnitude match."
"The standardized student logits have arbitrary magnitude suitable for the student’s capacity while preserving the essential relations learned from the teacher."