核心概念
Adaptive regularization of the representation rank of value networks in deep reinforcement learning based on the implicit constraints imposed by the Bellman equation.
摘要
The paper investigates the role of representation rank, which measures the expressive capacity of value networks in deep reinforcement learning (DRL). Existing studies focus on unboundedly maximizing this rank, which can lead to overly complex models that undermine performance.
The authors derive an upper bound on the cosine similarity of consecutive state-action pair representations of value networks, based on the Bellman equation. This upper bound implicitly constrains the representation rank. Motivated by this finding, the authors propose a novel regularizer called BEllman Equation-based automatic rank Regularizer (BEER), which adaptively regularizes the representation rank to improve DRL agent performance.
The key highlights are:
- Derivation of an upper bound on the cosine similarity of consecutive state-action pair representations, based on the Bellman equation.
- Introduction of the BEER regularizer that adaptively controls the representation rank by adhering to the constraints imposed by the Bellman equation.
- Validation of the effectiveness of BEER on illustrative experiments and scaling it up to challenging continuous control tasks, where BEER outperforms existing methods by a large margin.
- Demonstration of BEER's ability to reduce the approximation error of value function estimation compared to other algorithms.
統計資料
The representation rank is bounded by factors like the discount factor, the weights of the last layer of the neural network, and the representations norms.
The cosine similarity between consecutive state-action pair representations is upper bounded by a function of the representation norms and the discount factor.
引述
The cosine similarity is constrained, thereby restraining the representation rank itself.
The implication is profound: The cosine similarity is constrained, thereby restraining the representation rank itself.