This paper introduces the N-Agent Ad Hoc Teamwork (NAHT) problem, which generalizes both multi-agent reinforcement learning (MARL) and ad hoc teamwork (AHT) settings. In NAHT, a set of N autonomous agents must cooperate with an uncontrolled set of M-N teammates to accomplish a common task, where the number and types of teammates can vary dynamically.
The authors propose the Policy Optimization with Agent Modelling (POAM) algorithm to address the NAHT problem. POAM consists of two key components:
An agent modeling network that generates a vector characterizing the behaviors of the encountered teammates. This allows the POAM agents to adapt their policies based on the inferred properties of their teammates.
An independent actor-critic architecture, where the policy and value networks are conditioned on the learned teammate encoding vectors. This enables the POAM agents to coordinate effectively with diverse teammate behaviors.
The authors evaluate POAM on various StarCraft II tasks and demonstrate that it outperforms baseline MARL and AHT approaches in terms of sample efficiency, asymptotic performance, and generalization to out-of-distribution teammates. The results show that the agent modeling module is crucial for POAM's improved performance, as it allows the agents to rapidly adapt their behaviors to the encountered teammates.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Caroline Wan... at arxiv.org 04-17-2024
https://arxiv.org/pdf/2404.10740.pdfDeeper Inquiries