The paper introduces the Stationary Objectives for Exploration (SOFE) framework to address the non-stationarity of exploration bonuses in reinforcement learning (RL). Exploration bonuses, such as count-based rewards, pseudo-counts, and state-entropy maximization, are often non-stationary, as their dynamics change during training. This non-stationarity can make it difficult for RL agents to optimize these exploration objectives, leading to suboptimal performance.
SOFE proposes to augment the state representation with the sufficient statistics of the exploration bonuses, effectively transforming the non-stationary rewards into stationary rewards. This allows RL agents to optimize the exploration objectives more effectively, as the dynamics of the rewards become Markovian.
The paper evaluates SOFE across various environments and exploration modalities, including sparse-reward tasks, pixel-based observations, 3D navigation, and procedurally generated environments. The results show that SOFE significantly improves the performance of RL agents compared to vanilla exploration bonuses, enabling better exploration and higher task rewards. SOFE provides orthogonal gains to different exploration objectives, including count-based methods, pseudo-counts, and state-entropy maximization.
Furthermore, the paper demonstrates that SOFE scales to high-dimensional environments, where it improves the performance of the state-of-the-art exploration algorithm, E3B, in procedurally generated environments. The authors also show that SOFE is agnostic to the RL algorithm used and provides consistent improvements across various RL methods, including A2C, PPO, and SAC.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Roger Creus ... om arxiv.org 04-24-2024
https://arxiv.org/pdf/2310.18144.pdfDiepere vragen