Efficient Cross-Modality Knowledge Distillation with Contrastive Learning
The core message of this paper is to propose a generalizable cross-modality contrastive distillation (CMCD) framework that leverages contrastive learning to effectively distill knowledge from a source modality (e.g., image) to a target modality (e.g., sketch) without requiring labeled data in the target modality. The authors also provide a theoretical analysis that reveals the connection between the performance of the algorithm and the distance between the source and target modalities.