核心概念
Leveraging an adaptive skill distribution based on local entropy optimization can significantly improve exploration efficiency in goal-conditioned reinforcement learning tasks with sparse rewards and long horizons.
要約
The paper introduces a novel framework called Goal Exploration via Adaptive Skill Distribution (GEASD) to address the challenge of efficient exploration in goal-conditioned reinforcement learning (GCRL) tasks with sparse rewards and long horizons.
Key highlights:
- GEASD utilizes an adaptive skill distribution that optimizes the local entropy of achieved goals within a historical context, enabling the agent to leverage environmental structural patterns and facilitate deep exploration.
- The skill distribution is modeled as a Boltzmann distribution derived from skill value functions, which capture the expected local entropy changes for each skill.
- Novel intrinsic rewards are introduced to learn the skill value functions, reflecting the local entropy variations during the evolution of the historical context.
- Theoretical analysis justifies the use of the Boltzmann distribution for the skill distribution and the relationship between the local entropy changes and the overall exploration objective.
- Experiments on two challenging GCRL tasks demonstrate that GEASD significantly outperforms state-of-the-art exploration methods in terms of exploration efficiency, success rate, and entropy of achieved goals.
- The learned skill distribution also exhibits robust generalization capabilities, enabling efficient exploration in unseen tasks with similar local structures.
統計
The agent's success rate in reaching the desired goal is a key metric used to evaluate the exploration efficiency.
The entropy of achieved goals is another important metric that estimates the coverage of the explored space.
引用
"Exploration efficiency poses a significant challenge in goal-conditioned reinforcement learning (GCRL) tasks, particularly those with long horizons and sparse rewards."
"We introduce a novel framework, GEASD, designed to capture these patterns through an adaptive skill distribution during the learning process."
"Our experiments reveal marked improvements in exploration efficiency using the adaptive skill distribution compared to a uniform skill distribution."