Core Concepts
A simple combination of Low-Rank Adaptation (LoRA) and Gaussian Stochastic Weight Averaging (SWAG) can effectively enable approximate Bayesian inference in large language models, improving their generalization and calibration.
Abstract
The paper proposes a method that combines Low-Rank Adaptation (LoRA) and Gaussian Stochastic Weight Averaging (SWAG) to enable efficient and effective Bayesian adaptation of large language models (LLMs).
Key highlights:
LLMs often suffer from overconfidence and poor calibration, especially when fine-tuned on small datasets.
LoRA enables parameter-efficient fine-tuning of LLMs by introducing low-rank adaptation matrices, but the resulting models still exhibit poor calibration.
The authors integrate SWAG, a simple Bayesian inference method, with LoRA to obtain an approximate Bayesian treatment of the LoRA parameters.
Through extensive testing on NLP benchmarks, the authors demonstrate that their SWAG-LoRA approach improves model generalization and calibration compared to standard LoRA fine-tuning, MC Dropout, and LoRA ensembles.
The authors also show that their method exhibits greater robustness against distribution shift, outperforming more sophisticated techniques like Laplace-LoRA on out-of-distribution tasks.
The key advantages of the proposed method are its simplicity, computational efficiency, and consistent improvements in accuracy and calibration across various datasets.
Stats
Fine-tuning large language models on the full set of weights is inefficient and prohibitively expensive.
Large language models often suffer from overconfidence and poor calibration, especially when fine-tuned on small datasets.
Low-Rank Adaptation (LoRA) can enable parameter-efficient fine-tuning of large language models, but the resulting models still exhibit poor calibration.
Quotes
"Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor calibration, particularly when fine-tuned on small datasets."
"We propose a simple combination of Low-Rank Adaptation (LoRA) with Gaussian Stochastic Weight Averaging (SWAG), facilitating approximate Bayesian inference in LLMs."
"Through extensive testing across several Natural Language Processing (NLP) benchmarks, we demonstrate that our straightforward and computationally efficient approach improves model generalization and calibration."