Understanding Provably Robust DPO: Aligning Language Models with Noisy Feedback
The author introduces a robust framework for policy optimization in the presence of noisy preference data, focusing on the Direct Preference Optimization (DPO) algorithm. By designing a novel loss function, the proposed robust DPO (rDPO) policy is proven to be robust to noise in preference labels compared to other methods.