Prior Constraints-based Reward Model Training to Improve Alignment of Large Language Models
Incorporating prior constraints on length ratio and cosine similarity during reward model training can effectively regulate the optimization magnitude and control the score margins, leading to improved alignment of large language models.