Policy Gradient Methods in the Presence of Symmetries and State Abstractions
Stats
Bisimulation metrics are used to approximate equivalence relations in MDPs.
Continuous MDP homomorphisms extend the concept to continuous state and action spaces.
The stochastic HPG theorem integrates MDP homomorphisms into policy gradients for efficient optimization.
The value equivalence property holds for lifted policies in continuous settings.
Surjective maps f and gs ensure invariance of rewards and equivariance of transitions.
The optimal value equivalence is preserved under continuous MDP homomorphisms.
Quotes
"Reinforcement learning on high-dimensional problems benefits from abstraction and MDP homomorphisms, enabling efficient policy optimization."
"The stochastic HPG theorem integrates MDP homomorphisms into policy gradients for efficient optimization."
"The value equivalence property holds for lifted policies in continuous settings."