Self-supervised Pretraining Enhances Dataset Distillation for Large-scale Models
Self-supervised pretraining with larger variance in batch normalization statistics enables more informative data synthesis, outperforming previous supervised dataset distillation methods, especially when using larger recovery models.