Core Concepts
Selectively regularizing parameter updates during fine-tuning, as opposed to applying uniform regularization, leads to improved in-distribution generalization and out-of-distribution robustness in foundation models.
Tian, J., Huang, C., & Kira, Z. (2024). Rethinking Weight Decay for Robust Fine-Tuning of Foundation Models. Advances in Neural Information Processing Systems, 38.
This paper investigates the limitations of traditional weight decay methods in fine-tuning pre-trained foundation models and proposes a novel technique called Selective Projection Decay (SPD) to enhance robustness and generalization.