Core Concepts
This paper introduces a novel baseline for policy gradient reinforcement learning algorithms, utilizing optimal control theory to guide exploration and improve learning efficiency, especially in environments with sparse rewards.
Lyu, X., Li, S., Siriya, S., Pu, Y., & Chen, M. (2024). Optimal Control-Based Baseline for Guided Exploration in Policy Gradient Methods. arXiv preprint arXiv:2011.02073v5.
This paper addresses the challenge of inefficient exploration in reinforcement learning (RL), particularly in environments with sparse rewards, by proposing a novel baseline function for policy gradient methods based on optimal control theory.