Grunnleggende konsepter
PopulAtion Parameter Averaging (PAPA) improves model generalization by combining diverse models efficiently.
Sammendrag
Abstract:
Ensemble methods improve performance but are computationally expensive.
Weight averaging is less costly but may not always perform well.
PAPA combines the benefits of ensembling with the efficiency of weight averaging.
Introduction:
Ensembles require high computation costs at inference time.
Weight averaging can be less expensive but lacks guarantees of good performance.
PAPA leverages a population of diverse models to improve generalization.
PopulAtion Parameter Averaging (PAPA):
Trains a population of models by pushing weights towards the average.
Variants like PAPA-all and PAPA-2 offer different strategies for weight averaging.
Results show improved accuracy on CIFAR-10, CIFAR-100, and ImageNet datasets.
Training a Population of Networks:
Idea of averaging weights is simple but requires alignment for good performance.
EMA used to push weights towards average without losing diversity.
Handling Changes in Learning Rates:
Adjusting learning rate changes to maintain balance between SGD and PAPA effects.
Inference with the Population:
Models pushed towards average during training, then averaged for single model output.
Related Work:
Comparison with DART shows similar results, highlighting benefits of PAPA variants.
Statistikk
PAPA tends to perform best.
PAPA increases accuracy on CIFAR datasets.
Weight averaging reduces performance gap.