Belangrijkste concepten
An innovative refining scheme for reinforcement learning that incorporates explanation methods to break through training bottlenecks by constructing a mixed initial state distribution and encouraging exploration.
Samenvatting
The paper proposes RICE, a Refining scheme for ReInforCement learning with Explanation, to address the challenge of obtaining an optimally performing deep reinforcement learning (DRL) agent for complex tasks, especially with sparse rewards.
Key highlights:
- RICE leverages a state-of-the-art explanation method, StateMask, to identify the most critical states (i.e., steps that contribute the most to the final reward of a trajectory).
- Based on the explanation results, RICE constructs a mixed initial state distribution that combines the default initial states and the identified critical states to prevent overfitting.
- RICE further incentivizes the agent to explore starting from the identified critical states using Random Network Distillation (RND) as the exploration bonus.
- Theoretical analysis shows that RICE achieves a tighter sub-optimality bound by utilizing the mixed initial distribution.
- Extensive evaluations on simulated games and real-world applications demonstrate that RICE significantly outperforms existing refining schemes in enhancing agent performance.
Statistieken
The paper reports the following key metrics:
Final reward of the target agent before and after refining across various applications
Fidelity scores of the explanation methods (StateMask and the proposed method)
Training time and sample efficiency of the explanation methods
Citaten
"The high-level idea of RICE is to construct a new initial state distribution that combines both the default initial states and critical states identified through explanation methods, thereby encouraging the agent to explore from the mixed initial states."
"Through careful design, we can theoretically guarantee that our refining scheme has a tighter sub-optimality bound."