Large Language Model Alignment

insight - Large Language Model Alignment

BoNBoN Alignment: Optimizing Large Language Models for Human Preference Using Best-of-n Sampling

Best-of-n sampling is an essentially optimal strategy for aligning large language models to human preferences, and the BoNBoN alignment method effectively trains LLMs to mimic this distribution, achieving high win rates with minimal negative impact on off-target attributes.

Evolving Alignment in Large Language Models via Asymmetric Self-Play: A More Efficient and Generalizable Approach to Reinforcement Learning from Human Feedback

This paper introduces Evolving Alignment via Asymmetric Self-Play (eva), a novel framework for aligning large language models (LLMs) that improves upon traditional RLHF by dynamically evolving the prompt distribution during training, leading to more efficient and generalizable models.

Inference-Time Alignment of Large Language Models for Controllable Response Proficiency in Single and Multi-Domain Settings

This research introduces a novel method for aligning large language models (LLMs) at inference time, enabling users to dynamically control the proficiency level of generated responses across single and multiple domains using Alignment Vectors (AVs) derived from model editing techniques.

α-DPO：自適應獎勵邊緣是直接偏好優化所需要的

α-DPO 是一種新的偏好優化演算法，透過引入動態獎勵邊緣來改進大型語言模型的對齊，解決了 DPO 和 SimPO 的局限性，並在 AlpacaEval 2 和 Arena-Hard 等基準測試中展現出優於基準模型的效能。

SparsePO: Optimizing Large Language Model Preference Alignment by Selectively Weighting Token Importance in Reward and KL Divergence

SparsePO improves the alignment of large language models with human preferences by selectively weighting the importance of individual tokens when calculating rewards and KL divergence during preference optimization, leading to better performance in tasks requiring helpfulness, code generation, and summarization.

Online Preference Optimization in Proximity to the Behavior LLM (BPO) for Improved Alignment of Large Language Models

Aligning Large Language Models (LLMs) with human preferences is more effective when using online training data and constraining the learned LLM to stay close to the behavior of the LLM that generated the training data.

Iterative Nash Policy Optimization: Aligning Large Language Models with Human Preferences Using No-Regret Learning

This research paper introduces INPO, a novel online algorithm leveraging no-regret learning to align large language models with general human preferences, achieving superior performance compared to existing online RLHF methods.

About

Products

Resources