المفاهيم الأساسية
PopulAtion Parameter Averaging (PAPA) method improves generalization by combining diverse models efficiently.
الملخص
Abstract: PAPA combines generality of ensembling with efficiency of weight averaging.
Introduction: Ensemble methods leverage multiple models for improved performance but come at a high computational cost.
PopulAtion Parameter Averaging (PAPA): Describes the method and its variants, showcasing performance gains.
Training a population of networks: Explains how PAPA pushes weights towards the population average for diversity.
Special cases of PAPA: Discusses PAPA-all and PAPA-2 variants that replace weights rarely for better generalization.
Handling changes in learning rates: Details how learning rate changes affect the trade-off between SGD and PAPA.
Inference with the population: Explains how models are returned as single networks after training.
Related work: Compares PAPA to other methods like federated learning, genetic algorithms, and consensus optimization.
Experiments: Showcases results on CIFAR datasets, ImageNet, fine-tuning experiments, and comparisons with DART and SWA.
Discussion: Analyzes the benefits of averaging weights in improving model performance.
Conclusion: Concludes that PAPA is an efficient method for training large models on extensive data.
الإحصائيات
PAPA reduces the performance gap between averaging and ensembling, increasing accuracy by up to 1.9% on CIFAR-100.
اقتباسات
"Weight averaging is only beneficial when different enough to benefit from combining them." - Alexia Jolicoeur-Martineau