Core Concepts
Incorporating additional Kullback-Leibler (KL) regularization and using a mixture of previous iterates as the opponent can mitigate performance instability issues in the self-play fine-tuning (SPIN) approach for aligning language models with human preferences.
Abstract
The paper explores various regularization techniques to improve the performance and stability of the self-play fine-tuning (SPIN) approach for aligning large language models with human preferences.
Key highlights:
The SPIN method replaces the rejected answers with data generated from the previous iterate, but can suffer from performance instability issues during the learning phase.
The authors propose two complementary approaches to address this issue:
Incorporating an additional KL regularization term to keep the learned policy close to the base model.
Using a mixture of the previous iterates as the opponent, instead of just the most recent one, to smooth the learning process.
The proposed α-SPIN algorithm combines these two ideas and is evaluated on the MT-Bench and Hugging Face Open LLM Leaderboard benchmarks.
The results show that the KL regularization and the use of a mixture of previous iterates can improve the performance and stability of the SPIN approach.
The authors also investigate the use of fictitious play, where the opponent is an average of all previous iterates, as a further regularization technique.
Stats
The paper does not provide any specific numerical data or statistics. It focuses on the conceptual framework and empirical evaluation of the proposed regularization techniques.
Quotes
There are no direct quotes from the content that are particularly striking or support the key logics.