核心概念
By leveraging the geometric properties of fine-tuned weights, Model Stock approximates the center of the weight distribution using only a few fine-tuned models, achieving superior in-distribution and out-of-distribution performance compared to existing methods.
摘要
The paper introduces an efficient fine-tuning method called Model Stock that outperforms existing techniques like Model Soup while using significantly fewer fine-tuned models.
Key insights:
- Fine-tuned weights from different random seeds lie on a thin shell in the weight space, with consistent angle and norm across layers.
- Proximity to the center of the weight distribution correlates with improved in-distribution and out-of-distribution performance.
- Model Stock leverages these geometric properties to approximate the center of the weight distribution using only two fine-tuned models, without requiring additional training or heuristic hyperparameter settings.
- Experiments on CLIP ViT-B/32, ViT-B/16, and ViT-L/14 models show that Model Stock achieves state-of-the-art performance on ImageNet and distribution shift benchmarks, while being computationally more efficient than previous methods.
- The paper also provides new insights into the underlying mechanics of prior studies like WiSE-FT and Model Soup, interpreting their effectiveness through the lens of proximity to the weight distribution center.
統計資料
The ImageNet top-1 accuracy of Model Stock on CLIP ViT-B/32 is 81.19%.
The average accuracy on 5 distribution shift benchmarks for Model Stock on CLIP ViT-B/32 is 48.69%.
The ImageNet top-1 accuracy of Model Stock on CLIP ViT-B/16 is 85.2%.
The average accuracy on 5 distribution shift benchmarks for Model Stock on CLIP ViT-B/16 is 60.1%.
The ImageNet top-1 accuracy of Model Stock on CLIP ViT-L/14 is 87.7%.
The average accuracy on 5 distribution shift benchmarks for Model Stock on CLIP ViT-L/14 is 73.5%.
引述
"Our innovative layer-wise weight averaging technique surpasses state-of-the-art model methods such as Model Soup, utilizing only two fine-tuned models."
"Model Stock approximates the merged weight using just a few fine-tuned models, leveraging the weight space's geometric properties and a pre-trained model's anchoring effect."
"We achieve performance comparable to, or even surpassing, that of the more resource-intensive methods such as Model Soup [32], using only a fraction of the models."