The paper considers Markov Decision Processes (MDPs) with d-dimensional rewards, where the objective is to satisfy the Energy condition in the first dimension (the accumulated reward never drops below 0) and the Mean Payoff condition in the remaining d-1 dimensions (the mean payoff is strictly positive almost surely).
The key insights are:
The authors construct a winning strategy that alternates between two modes: a "Gain" phase that focuses on achieving positive mean payoff, and a "Bailout" phase that focuses on replenishing the energy level. By bounding the energy level that needs to be remembered, the strategy can be implemented with finite memory, while still ensuring the almost-sure satisfaction of the Energy-Mean Payoff objective.
The paper also shows that the existence of an almost-surely winning strategy for Energy-Mean Payoff is decidable in pseudo-polynomial time.
Başka Bir Dile
kaynak içeriğinden
arxiv.org
Önemli Bilgiler Şuradan Elde Edildi
by Mohan Dantam... : arxiv.org 04-24-2024
https://arxiv.org/pdf/2404.14522.pdfDaha Derin Sorular