Core Concepts
HOP, a novel modularity-based approach inspired by PNN, effectively mitigates catastrophic forgetting in continual reinforcement learning by dynamically forming a hierarchy of policies based on state similarity, outperforming PPO and achieving comparable results to task-labeled PNN.
Stats
HOP outperforms PPO in continual learning scenarios, achieving a faster recovery of performance and higher final performance.
HOP requires 1.04 million steps to recover performance in the StarPilot-Climber experiment, compared to 2.68 million for PPO.
HOP achieves a final average reward of 18.15 in the StarPilot-Climber experiment, compared to 12.14 for PPO.
HOP demonstrates substantial transfer between environments with similar dynamics, such as Ninja and CoinRun.
HOP forms 18 hierarchical policy levels during the experiments.