This paper proposes the novel problem setting of N-Agent Ad Hoc Teamwork (NAHT), which generalizes both cooperative multi-agent reinforcement learning (MARL) and ad hoc teamwork (AHT). It introduces the Policy Optimization with Agent Modeling (POAM) algorithm, which leverages agent modeling and policy gradient methods to enable cooperative behavior in the presence of varying numbers and types of uncontrolled teammates.
다양한 유형의 팀원들과 협력하여 공동 과제를 수행할 수 있는 자율 에이전트 팀을 만드는 새로운 접근법을 제안한다.