Contrastive Knowledge Distillation: Aligning Teacher and Student Logits from a Sample-wise Perspective
The proposed Contrastive Knowledge Distillation (CKD) approach aligns teacher and student logits by simultaneously minimizing intra-sample logit differences and maximizing inter-sample logit dissimilarities.