核心概念
The core message of this paper is to introduce a novel causal inference perspective to handle the distribution shifts between the substitution and original data in the data-free knowledge distillation (DFKD) task, and propose a Knowledge Distillation Causal Intervention (KDCI) framework to de-confound the biased student learning process.
要約
The paper addresses the distribution shifts between the substitution and original data in the data-free knowledge distillation (DFKD) task, which has been a long-overlooked issue. The authors first customize a causal graph to describe the causalities among the variables in the DFKD task, revealing that the distribution shifts act as a harmful confounder that significantly impacts the student's learning process.
To tackle this issue, the authors propose a Knowledge Distillation Causal Intervention (KDCI) framework. KDCI first constructs a confounder dictionary to explore the prior knowledge of the substitution data, and then compensates for the biased student predictions based on the confounder dictionary and prototype proportions. This process aims to achieve de-confounded distillation and enable the student to learn pure knowledge from the teacher.
The authors conduct extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet datasets, combining KDCI with six representative DFKD methods. The results demonstrate that KDCI can consistently and significantly improve the performance of existing DFKD methods, e.g., improving the accuracy of DeepInv by up to 15.54% on CIFAR-100. The authors also provide detailed analyses on the components of KDCI and the impact of the confounder dictionary, further validating the effectiveness of the proposed framework.
統計
The CIFAR-10 and CIFAR-100 datasets contain 50,000 training samples and 10,000 testing samples of 32x32 resolution.
The Tiny-ImageNet dataset contains 100,000 training samples, 10,000 validating samples, and 10,000 testing samples of 64x64 resolution.
The ImageNet dataset contains 1000 classes with 1.28 million training samples and 50,000 validating samples of 224x224 resolution.
引用
"To our best knowledge, we are the first to alleviate the dilemma of the distribution shifts in the DFKD task from a causality-based perspective. Such shifts are regarded as the harmful confounder, which leads the student to learn misleading knowledge."
"We propose a KDCI framework to restrain the detrimental effect caused by the confounder and attempt to achieve the de-confounded distillation process. Besides, KDCI can be easily and flexibly combined with existing generation-based or sampling-based DFKD paradigms."
"Extensive experiments on the combination with six DFKD methods show that our KDCI can bring consistent and significant improvements to existing state-of-the-art models. Particularly, it improves the accuracy of the DeepInv [26] by up to 15.54% on the CIFAR-100 dataset."