Core Concepts
Adversarial learning and implicit regularization improve self-knowledge distillation by aligning predictive distributions.
Abstract
The content introduces AI-KD, a novel method for self-knowledge distillation using adversarial learning and implicit regularization. It discusses the motivation behind the approach, the methodology, and its effectiveness on various datasets. The paper also compares AI-KD with existing methods in terms of performance metrics.
Directory:
Introduction to Knowledge Distillation Methods
KD aims at model compression by transferring knowledge from teacher to student networks.
Self-Knowledge Distillation (Self-KD)
Focuses on training the network itself as a teacher for regularization and generalization.
Proposed AI-KD Methodology
Combines adversarial learning and implicit regularization to align distributions between pre-trained and student models.
Experiment Results
Evaluation of AI-KD on coarse and fine-grained datasets with different network architectures.
Comparison with Representative Self-KD Methods
Performance comparison of AI-KD with CS-KD, TF-KD, PS-KD, TF-FD, ZipfsLS on various datasets.
Implementation Details and Metrics Used
Details about datasets, evaluation metrics, implementation environment, and parameters used in experiments.
Stats
Our proposed method records 19.87% Top-1 error on PreAct ResNet-18.
The Top-5 error rate is 4.81% for CIFAR-100 dataset using ResNet-18 architecture.