The authors consider the problem of solving long-run average reward robust Markov decision processes (RMDPs) with polytopic uncertainty sets. They present a novel perspective on this problem by showing that it can be reduced in linear time to the problem of solving long-run average reward turn-based stochastic games (TBSGs).
The key highlights and insights are:
Reduction to TBSGs: The authors formally define a linear-time reduction from polytopic RMDPs to TBSGs. This reduction allows them to leverage results from the TBSG literature to obtain new insights on RMDPs.
Computational Complexity: Using the reduction, the authors show that the threshold decision problem for long-run average reward polytopic RMDPs is in NP ∩ CONP. They also show that these RMDPs admit a randomized algorithm with sub-exponential expected runtime.
Efficient Algorithms: The authors propose Robust Polytopic Policy Iteration (RPPI), a novel policy iteration algorithm for solving long-run average reward polytopic RMDPs. RPPI does not impose any structural restrictions on the RMDP, unlike prior value iteration-based algorithms.
Experimental Evaluation: The authors implement RPPI and experimentally compare it against state-of-the-art value iteration-based methods. The results demonstrate significant computational runtime gains provided by the policy iteration-based RPPI, especially on non-unichain polytopic RMDPs to which existing methods are not applicable.
Para outro idioma
do conteúdo fonte
arxiv.org
Principais Insights Extraídos De
by Kris... às arxiv.org 05-01-2024
https://arxiv.org/pdf/2312.13912.pdfPerguntas Mais Profundas