Regularized Reward Learning for Robust Robotic Reinforcement Learning from Human Feedback
Introducing a novel regularization technique called "agent preference" to mitigate reward overoptimization in preference-based robotic reinforcement learning from human feedback.