The author introduces a novel reference-free monolithic preference alignment method, ORPO, emphasizing the importance of supervised fine-tuning (SFT) in preference alignment. ORPO outperforms other methods across various scales and demonstrates efficiency and effectiveness in aligning language models.
ORPO introduces a novel reference-free preference optimization method, showcasing superior performance in language model alignment.