核心概念
Successful knowledge distillation depends on sufficient sampling of the teacher model's output space and decision boundaries, and surprisingly, even unconventional datasets like unoptimized synthetic imagery can be effective when these criteria are met.
統計資料
Distilling with ImageNet ID and OOD images both come within 1.5% accuracy of the CIFAR10 distilled student.
Distilling using OpenGL shader images obtains within 2%, 5%, and 0.2% of the CIFAR10, CIFAR100, and EuroSAT distilled students, respectively.
Distilling with CIFAR10 gained 8.7% whereas distilling with OpenGL shaders improved by 62.2% when data augmentation is used.
The OpenGL shader student obtained an MNIST test accuracy score of 92.89% compared to 38.78% accuracy for the CIFAR10 student in a toy MNIST experiment.
The CIFAR10, CIFAR100, and EuroSAT teachers distilled with FGVCA gained 76.8%, 35.9%, and 38% in accuracy, respectively, when using the adversarial attack method.
引述
"is it possible to distill knowledge with even the most unconventional dataset?"
"does the data even need to be real?"
"if certain criteria are met, many different datasets can act as reasonable replacements when the original data are missing."
"one could reasonably be able to transfer knowledge to a student using unnatural synthetic imagery (i.e., the data does not need to be real)."