The authors analyze and propose algorithms for Markov potential games under the average reward criterion, demonstrating convergence to Nash equilibriums. They establish time complexity matching discounted reward settings.
Reinforcement learning on high-dimensional problems benefits from abstraction and MDP homomorphisms, enabling efficient policy optimization.
Globale Konvergenz von Policy Gradient Methoden für Markov-Potentialspiele mit durchschnittlicher Belohnung.