Efficient Dual Perturbation Robustness in Low-rank Markov Decision Processes
The authors propose a novel robustness concept based on (ξ, η)-rectangularity that achieves efficient dual perturbation robustness in low-rank Markov decision processes (MDPs). They design an algorithm (R2PG) that provably converges to the optimal robust policy with bounded suboptimality.