Tasdighi, B., Haussmann, M., Werge, N., Wu, Y.-S., & Kandemir, M. (2024). Deep Exploration with PAC-Bayes. arXiv preprint arXiv:2402.03055v2.
This paper addresses the challenge of deep exploration in continuous control tasks with sparse rewards, aiming to develop a reinforcement learning algorithm that can efficiently learn in such environments.
The researchers develop a novel algorithm called PAC-Bayesian Actor-Critic (PBAC) by formulating the deep exploration problem from a Probably Approximately Correct (PAC) Bayesian perspective. They quantify the Bellman operator error using a generic PAC-Bayes bound, treating a bootstrapped ensemble of critic networks as an empirical posterior distribution. A data-informed function-space prior is constructed from the corresponding target networks. The algorithm utilizes posterior sampling during training for exploration and Bayesian model averaging during evaluation.
The study presents PBAC as an effective solution for deep exploration in continuous control tasks with sparse rewards. The PAC-Bayesian approach provides a principled framework for quantifying uncertainty and guiding exploration.
This research contributes to the field of deep reinforcement learning by introducing a novel and effective algorithm for tackling the challenging problem of exploration in sparse reward settings, which has significant implications for real-world applications.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Bahareh Tasd... a las arxiv.org 10-04-2024
https://arxiv.org/pdf/2402.03055.pdfConsultas más profundas