Efficient Preference-Based Reinforcement Learning with Reward-Agnostic Exploration
The authors propose a novel theoretical framework for preference-based reinforcement learning (PbRL) that decouples the interaction with the environment and the collection of human feedback. This allows for efficient learning of the optimal policy under linear reward parametrization and unknown transitions.