The paper proposes approximate decentralized policy iteration (ADPI) algorithms for cooperative multi-agent Markov decision processes (CO-MA-MDPs) that can handle large state-action spaces.
For finite horizon CO-MA-MDPs, the algorithm (Algorithm 3) computes the approximate cost function using approximate linear programming (ALP) and performs decentralized policy iteration, where each agent improves its policy unilaterally assuming the policies of other agents are fixed. This is unlike prior work that used exact value functions, which is computationally expensive.
For infinite horizon discounted CO-MA-MDPs, the algorithm (Algorithm 5) also uses ALP-based approximate policy evaluation, unlike prior work that used exact policy evaluation.
Theoretical guarantees are provided for the proposed algorithms, showing that the policies obtained are close to those obtained using exact value functions. Experiments on standard multi-agent tasks demonstrate the effectiveness of the proposed algorithms, outperforming prior state-of-the-art methods.
翻译成其他语言
从原文生成
arxiv.org
更深入的查询