核心概念
Decoupling logit outputs at different scales enhances knowledge transfer, improving student performance.
要約
The content discusses the limitations of conventional logit-based distillation methods and introduces Scale Decoupled Distillation (SDD) to address these issues. SDD decouples global logit outputs into local logit outputs, allowing for more precise knowledge transfer. The method divides knowledge into consistent and complementary parts, improving discrimination ability. Extensive experiments demonstrate the effectiveness of SDD across various teacher-student pairs, especially in fine-grained classification tasks.
Directory:
- Abstract
- Logit knowledge distillation challenges.
- Introduction of Scale Decoupled Distillation (SDD).
- Introduction
- Overview of knowledge distillation techniques.
- Categorization into logit-based and feature-based distillation.
- Methodology
- Notation and description of conventional knowledge distillation.
- Description of Scale Decoupled Knowledge Distillation (SDD).
- Experiments
- Experimental setups on benchmark datasets.
- Comparison results with various teacher-student pairs.
- Conclusion
- Summary of findings and contributions.
- Appendix
- Ablation study on different aspects of SDD methodology.
統計
"Extensive experiments on several benchmark datasets demonstrate the effectiveness of SDD for wide teacher-student pairs."
"For most teacher-student pairs, SDD can contribute to more than 1% performance gain on small or large-scale datasets."
引用
"We propose a simple but effective method, i.e., Scale Decoupled Distillation (SDD), for logit knowledge distillation."
"By increasing the weight of complementary parts, SDD can guide the student to focus more on ambiguous samples, improving its discrimination ability."