Core Concepts
The authors propose a novel robustness concept based on (ξ, η)-rectangularity that achieves efficient dual perturbation robustness in low-rank Markov decision processes (MDPs). They design an algorithm (R2PG) that provably converges to the optimal robust policy with bounded suboptimality.
Abstract
The paper addresses the challenge of achieving robustness in reinforcement learning (RL) agents trained in simulated environments, which often suffer from performance degradation when deployed in real-world environments due to distributional shift.
The key insights are:
The authors propose a new robustness concept based on (ξ, η)-rectangularity, which allows for efficient dual perturbation of both the feature maps and the factors in low-rank MDPs. This is in contrast to existing methods that only consider perturbation of either the features or the factors.
They design the R2PG algorithm that iteratively evaluates and improves the robust policy. The policy evaluation step involves solving an optimization problem that can be reduced to a semi-definite program, making the algorithm computationally efficient.
The authors provide a convergence guarantee for the R2PG algorithm, showing that the robust value of the output policy is close to that of the optimal robust policy.
Numerical simulations on a toy model demonstrate the effectiveness of the proposed approach, where the output policies exhibit more conservative behavior as the perturbation radius increases.
The paper makes a significant contribution by introducing a new robustness concept that is compatible with low-rank representations and enables the design of provably efficient and scalable algorithms for robust RL.
Stats
The paper does not contain any explicit numerical data or statistics. It focuses on the theoretical development of the robust low-rank MDP framework and the R2PG algorithm.