Core Concepts
The core message of this work is that the problem of solving long-run average reward polytopic robust Markov decision processes (RMDPs) can be reduced in linear time to the problem of solving long-run average reward turn-based stochastic games (TBSGs). This reduction enables the transfer of results from the TBSG literature to the RMDP setting, leading to novel insights on the computational complexity and efficient algorithms for solving long-run average reward polytopic RMDPs.
Abstract
The authors consider the problem of solving long-run average reward robust Markov decision processes (RMDPs) with polytopic uncertainty sets. They present a novel perspective on this problem by showing that it can be reduced in linear time to the problem of solving long-run average reward turn-based stochastic games (TBSGs).
The key highlights and insights are:
Reduction to TBSGs: The authors formally define a linear-time reduction from polytopic RMDPs to TBSGs. This reduction allows them to leverage results from the TBSG literature to obtain new insights on RMDPs.
Computational Complexity: Using the reduction, the authors show that the threshold decision problem for long-run average reward polytopic RMDPs is in NP ∩ CONP. They also show that these RMDPs admit a randomized algorithm with sub-exponential expected runtime.
Efficient Algorithms: The authors propose Robust Polytopic Policy Iteration (RPPI), a novel policy iteration algorithm for solving long-run average reward polytopic RMDPs. RPPI does not impose any structural restrictions on the RMDP, unlike prior value iteration-based algorithms.
Experimental Evaluation: The authors implement RPPI and experimentally compare it against state-of-the-art value iteration-based methods. The results demonstrate significant computational runtime gains provided by the policy iteration-based RPPI, especially on non-unichain polytopic RMDPs to which existing methods are not applicable.
Stats
There are no key metrics or important figures used to support the author's key logics.
Quotes
There are no striking quotes supporting the author's key logics.