Efficient Reinforcement Learning Algorithms for Diverse Learning Goals: PAC, Reward-Free, Preference-Based, and Beyond
This paper presents a unified algorithmic framework for a broad range of reinforcement learning goals, including PAC learning, reward-free learning, model estimation, and preference-based learning. The framework is based on a generalized notion of the Decision-Estimation Coefficient (DEC) that captures the intrinsic sample complexity for each learning goal.