The paper addresses the distribution shifts between the substitution and original data in the data-free knowledge distillation (DFKD) task, which has been a long-overlooked issue. The authors first customize a causal graph to describe the causalities among the variables in the DFKD task, revealing that the distribution shifts act as a harmful confounder that significantly impacts the student's learning process.
To tackle this issue, the authors propose a Knowledge Distillation Causal Intervention (KDCI) framework. KDCI first constructs a confounder dictionary to explore the prior knowledge of the substitution data, and then compensates for the biased student predictions based on the confounder dictionary and prototype proportions. This process aims to achieve de-confounded distillation and enable the student to learn pure knowledge from the teacher.
The authors conduct extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet datasets, combining KDCI with six representative DFKD methods. The results demonstrate that KDCI can consistently and significantly improve the performance of existing DFKD methods, e.g., improving the accuracy of DeepInv by up to 15.54% on CIFAR-100. The authors also provide detailed analyses on the components of KDCI and the impact of the confounder dictionary, further validating the effectiveness of the proposed framework.
לשפה אחרת
מתוכן המקור
arxiv.org
תובנות מפתח מזוקקות מ:
by Yuzheng Wang... ב- arxiv.org 03-29-2024
https://arxiv.org/pdf/2403.19539.pdfשאלות מעמיקות