Sample-Efficient Preference-based Reinforcement Learning with Dynamics-Aware Rewards
The author argues that dynamics-aware reward functions significantly improve the sample efficiency of preference-based reinforcement learning, leading to faster policy learning and better final policy performance.