The authors consider the problem of solving long-run average reward robust Markov decision processes (RMDPs) with polytopic uncertainty sets. They present a novel perspective on this problem by showing that it can be reduced in linear time to the problem of solving long-run average reward turn-based stochastic games (TBSGs).
The key highlights and insights are:
Reduction to TBSGs: The authors formally define a linear-time reduction from polytopic RMDPs to TBSGs. This reduction allows them to leverage results from the TBSG literature to obtain new insights on RMDPs.
Computational Complexity: Using the reduction, the authors show that the threshold decision problem for long-run average reward polytopic RMDPs is in NP ∩ CONP. They also show that these RMDPs admit a randomized algorithm with sub-exponential expected runtime.
Efficient Algorithms: The authors propose Robust Polytopic Policy Iteration (RPPI), a novel policy iteration algorithm for solving long-run average reward polytopic RMDPs. RPPI does not impose any structural restrictions on the RMDP, unlike prior value iteration-based algorithms.
Experimental Evaluation: The authors implement RPPI and experimentally compare it against state-of-the-art value iteration-based methods. The results demonstrate significant computational runtime gains provided by the policy iteration-based RPPI, especially on non-unichain polytopic RMDPs to which existing methods are not applicable.
إلى لغة أخرى
من محتوى المصدر
arxiv.org
استفسارات أعمق