The paper considers Markov Decision Processes (MDPs) with d-dimensional rewards, where the objective is to satisfy the Energy condition in the first dimension (the accumulated reward never drops below 0) and the Mean Payoff condition in the remaining d-1 dimensions (the mean payoff is strictly positive almost surely).
The key insights are:
The authors construct a winning strategy that alternates between two modes: a "Gain" phase that focuses on achieving positive mean payoff, and a "Bailout" phase that focuses on replenishing the energy level. By bounding the energy level that needs to be remembered, the strategy can be implemented with finite memory, while still ensuring the almost-sure satisfaction of the Energy-Mean Payoff objective.
The paper also shows that the existence of an almost-surely winning strategy for Energy-Mean Payoff is decidable in pseudo-polynomial time.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Mohan Dantam... lúc arxiv.org 04-24-2024
https://arxiv.org/pdf/2404.14522.pdfYêu cầu sâu hơn