Core Concepts
The author proposes Distributional Dataset Distillation (D3) as a memory-efficient method by distilling datasets into compact distributional representations. Federated distillation is introduced to scale up the process efficiently.
Abstract
The content discusses the challenges of existing dataset distillation methods and introduces D3, a novel approach that focuses on efficient compression through distributional representations. The federated distillation strategy is also presented to parallelize the distillation process for large datasets like ImageNet-1K.
The paper highlights the limitations of current prototype-based methods in terms of storage costs and post-distillation training times. It introduces D3 as a solution that encodes data using minimal sufficient per-class statistics and decoders for more memory-efficient representation.
Federated distillation is proposed to decompose datasets into subsets, distill them in parallel using sub-task experts, and then re-aggregate them for efficient scaling. The study evaluates these methods on various metrics like total storage cost, downstream training cost, and recovery accuracy.
Key points include the importance of evaluating dataset distillation methods comprehensively, the proposal of D3 for efficient compression, and the introduction of federated distillation for scalability. The study showcases state-of-the-art results on TinyImageNet and ImageNet-1K under different storage budgets.
Stats
Our method outperforms prior art by 6.9% on ImageNet-1K under a storage budget of 2 images per class.
SRe2L achieved a ∼100× IPC reduction on ImageNet-1K with ∼77% classification accuracy.
Storage cost analysis reveals unexpected costs from distilled labels not captured by IPC metric.
Downstream training time increases due to decoding/generation/augmentation procedures post-distillation.
Quotes
"Our method achieves state-of-the-art results on TinyImageNet and ImageNet-1K."
"We propose new evaluation protocols and metrics for dataset distillation methods."
"The true compression rate of existing methods is much lower than implied by the IPC metric."