Solving Long-run Average Reward Robust Markov Decision Processes via Reduction to Turn-based Stochastic Games
The core message of this work is that the problem of solving long-run average reward polytopic robust Markov decision processes (RMDPs) can be reduced in linear time to the problem of solving long-run average reward turn-based stochastic games (TBSGs). This reduction enables the transfer of results from the TBSG literature to the RMDP setting, leading to novel insights on the computational complexity and efficient algorithms for solving long-run average reward polytopic RMDPs.