Conceitos essenciais
Jointly learning context representations and policy enables improved zero-shot generalization in reinforcement learning.
Resumo
This paper proposes a novel reinforcement learning (RL) algorithm that jointly learns context representations and policy to improve zero-shot generalization. The key insights are:
- Inferring context from past experiences is more effective when the context representation is tailored to the specific policy being learned, rather than the full transition dynamics.
- By backpropagating the policy loss through the context encoder, the learned context embeddings capture information relevant to the current policy, enabling better adaptation to unseen environments.
- Experiments across multiple simulated environments show that the proposed joint learning approach outperforms prior context-learning techniques in zero-shot generalization settings, particularly in more complex environments like Ant.
- The learned context embeddings are shown to better capture the underlying changes in the environment dynamics compared to a decoupled context learning approach.
Overall, this work represents a significant step towards creating more autonomous and versatile RL agents that can effectively adapt to diverse real-world tasks without additional training.
Estatísticas
The paper reports the following key metrics:
Episodic return (area under the curve) during training across different environments
Interquartile mean (IQM) of normalized scores in interpolation and extrapolation settings for zero-shot generalization evaluation
Mean squared error of predicting the ground-truth context from the learned context embeddings
Citações
"By jointly optimizing the context and policy representations, jcpl can discover latent embeddings that encode relevant information about the varying transition dynamics, facilitating improved generalization."
"Our approach of jointly learning behavior-specific context embeddings directly addresses this open question and demonstrates improved generalization performance, especially in complex environments like Ant."