The paper proposes approximate decentralized policy iteration (ADPI) algorithms for cooperative multi-agent Markov decision processes (CO-MA-MDPs) that can handle large state-action spaces.
For finite horizon CO-MA-MDPs, the algorithm (Algorithm 3) computes the approximate cost function using approximate linear programming (ALP) and performs decentralized policy iteration, where each agent improves its policy unilaterally assuming the policies of other agents are fixed. This is unlike prior work that used exact value functions, which is computationally expensive.
For infinite horizon discounted CO-MA-MDPs, the algorithm (Algorithm 5) also uses ALP-based approximate policy evaluation, unlike prior work that used exact policy evaluation.
Theoretical guarantees are provided for the proposed algorithms, showing that the policies obtained are close to those obtained using exact value functions. Experiments on standard multi-agent tasks demonstrate the effectiveness of the proposed algorithms, outperforming prior state-of-the-art methods.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Lakshmi Mand... a las arxiv.org 05-01-2024
https://arxiv.org/pdf/2311.11789.pdfConsultas más profundas