Core Concepts
This paper presents a unified algorithmic framework for a broad range of reinforcement learning goals, including PAC learning, reward-free learning, model estimation, and preference-based learning. The framework is based on a generalized notion of the Decision-Estimation Coefficient (DEC) that captures the intrinsic sample complexity for each learning goal.
Abstract
The paper proposes a unified algorithmic framework, called G-E2D, that can handle a wide range of reinforcement learning goals beyond just no-regret learning. The key idea is to introduce a generalized notion of the Decision-Estimation Coefficient (G-DEC) that captures the intrinsic sample complexity for each specific learning goal.
The paper first introduces the PAC DEC (PACDEC) for PAC reinforcement learning, showing that it is both necessary and sufficient for PAC learning. It then generalizes this concept to the G-DEC, which can handle other learning goals such as:
Reward-free learning: The goal is to explore the environment efficiently without observing rewards, so that a near-optimal policy can be computed for any reward function using the collected data.
Model estimation: The goal is to estimate the underlying environment model accurately, rather than just learning an optimal policy.
Preference-based learning: The performance is measured by human preferences rather than numerical rewards.
For each of these learning goals, the paper defines the corresponding G-DEC and proposes a G-E2D algorithm that provably achieves the optimal sample complexity characterized by the G-DEC. The paper also shows that the G-DEC provides an information-theoretic lower bound for the corresponding learning goal.
Furthermore, the paper shows that the G-DEC framework can recover and unify many existing structural conditions for sample-efficient reinforcement learning, such as Bellman rank, Eluder dimension, and Bellman-Eluder dimension. It also establishes connections between the G-E2D algorithm and other model-based RL algorithms like Posterior Sampling and Maximum Likelihood Estimation.