Główne pojęcia
Finite-memory strategies suffice for almost-surely winning the Energy-Mean Payoff objective in Markov Decision Processes, even though infinite memory is required for the closely related Energy-Parity objective.
Streszczenie
The paper considers Markov Decision Processes (MDPs) with d-dimensional rewards, where the objective is to satisfy the Energy condition in the first dimension (the accumulated reward never drops below 0) and the Mean Payoff condition in the remaining d-1 dimensions (the mean payoff is strictly positive almost surely).
The key insights are:
- Finite-memory strategies suffice for almost-surely winning the Energy-Mean Payoff objective, in contrast to the Energy-Parity objective which requires infinite memory.
- Deterministic strategies with an exponential number of memory modes are sufficient for almost-surely winning the Energy-Mean Payoff objective.
- An exponential number of memory modes is also necessary, even for randomized strategies.
The authors construct a winning strategy that alternates between two modes: a "Gain" phase that focuses on achieving positive mean payoff, and a "Bailout" phase that focuses on replenishing the energy level. By bounding the energy level that needs to be remembered, the strategy can be implemented with finite memory, while still ensuring the almost-sure satisfaction of the Energy-Mean Payoff objective.
The paper also shows that the existence of an almost-surely winning strategy for Energy-Mean Payoff is decidable in pseudo-polynomial time.
Statystyki
There are no key metrics or important figures used to support the author's key logics.
Cytaty
"We show that finite memory suffices for almost surely winning strategies for the Energy-MeanPayoff objective. This is in contrast to the closely related Energy-Parity objective, where almost surely winning strategies require infinite memory in general."
"We show that exponential memory is sufficient (even for deterministic strategies) and necessary (even for randomized strategies) for almost surely winning Energy-MeanPayoff."