Improving Reward Generalization in Reinforcement Learning from Human Feedback through Dataset Information Structure Design
Careful design of the information structure in the human preference dataset can significantly improve the generalization performance of the reward model in RLHF, without requiring changes to the feedback collection mechanism or the amount of feedback.