The content discusses the concept of dataset condensation, which aims to efficiently transfer critical attributes from an original dataset to a synthetic version while maintaining diversity and realism. Previous methods have faced challenges, such as high computational costs or restricted design spaces, limiting their effectiveness on large-scale datasets.
To address these limitations, the authors propose Elucidate Dataset Condensation (EDC), a comprehensive design framework that includes specific, effective strategies:
Real image initialization: Using real images instead of Gaussian noise for data initialization, which improves the realism of the condensed dataset and simplifies the optimization process.
Soft category-aware matching: Employing a Gaussian Mixture Model (GMM) to effectively approximate complex data distributions and align the condensed dataset with the original dataset at the category level.
Flatness regularization: Applying a lightweight flatness regularization approach during data synthesis to ensure a flat loss landscape, enhancing the generalization capability of the condensed dataset.
Smoothing learning rate schedule and smaller batch size: Incorporating these strategies during post-evaluation to prevent model under-convergence and improve performance.
The authors extensively evaluate EDC on various datasets, including ImageNet-1k, CIFAR-10/100, and Tiny-ImageNet, and demonstrate state-of-the-art performance while significantly reducing computational costs compared to previous methods. EDC also exhibits strong cross-architecture generalization, outperforming the latest state-of-the-art method, RDED, by substantial margins.
The comprehensive design choices and thorough empirical analysis in this work provide valuable insights and a benchmark for future research in the field of dataset condensation.
إلى لغة أخرى
من محتوى المصدر
arxiv.org
الرؤى الأساسية المستخلصة من
by Shitong Shao... في arxiv.org 04-23-2024
https://arxiv.org/pdf/2404.13733.pdfاستفسارات أعمق