Core Concepts
The authors propose a novel self-supervised dataset distillation framework that distills an unlabeled dataset into a small set of synthetic samples, which can be effectively used for pre-training and transfer learning on various downstream tasks.
Abstract
The authors propose a novel problem of self-supervised dataset distillation, where the goal is to distill an unlabeled dataset into a small set of synthetic samples that can be effectively used for pre-training and transfer learning.
Key highlights:
- The authors observe that existing bilevel optimization methods for dataset distillation suffer from instability when using self-supervised learning (SSL) objectives due to the randomness introduced by data augmentations or masking. They prove that the gradient of the SSL loss with randomly sampled data augmentations or masking is a biased estimator of the true gradient.
- To address this issue, the authors propose to use mean squared error (MSE) as the inner and outer objective functions, which does not introduce any randomness. For the inner objective, they minimize the MSE between the model's representations of the synthetic samples and their corresponding learnable target representations. For the outer objective, they minimize the MSE between the representations of the model trained on the synthetic samples and the representations of the self-supervised target model trained on the original full dataset.
- The authors also simplify the inner optimization by decomposing the model into a feature extractor and a linear head, and optimizing only the linear head using kernel ridge regression while freezing the feature extractor.
- The authors extensively validate their proposed method, KRR-ST, on various transfer learning tasks, architecture generalization, and target data-free knowledge distillation, and show that it outperforms existing supervised dataset distillation methods.
Stats
The authors use CIFAR100, TinyImageNet, and ImageNet as source datasets, and CIFAR10, Aircraft, Stanford Cars, CUB2011, Stanford Dogs, and Flowers as target datasets.
The data compression ratio for the source datasets is 2% for CIFAR100 and TinyImageNet, and around 0.08% for ImageNet.
Quotes
"We have empirically found that back-propagating through data augmentation or masking is unstable and challenging."
"We prove that a gradient of the self-supervised learning (SSL) loss with randomly sampled data augmentations or masking is a biased estimator of the true gradient, explaining the instability."