Kernekoncepter
The core message of this paper is to introduce a novel approach called Best Response Shaping (BRS) that trains an agent by differentiating through an opponent approximating the best response, in order to learn reciprocity-based cooperative policies in partially competitive multi-agent environments.
Resumé
The paper investigates the challenge of multi-agent deep reinforcement learning in partially competitive environments, where traditional methods struggle to foster reciprocity-based cooperation. The authors introduce a novel approach called Best Response Shaping (BRS) that trains an agent by differentiating through an opponent approximating the best response, referred to as the "detective".
To enable the detective to condition on the agent's policy, the authors propose a state-aware differentiable conditioning mechanism facilitated by a question answering (QA) method. The agent is then trained by differentiating through the detective using the REINFORCE gradient estimator. Additionally, the authors propose self-play as a regularization method to encourage cooperative behavior.
The authors empirically validate their method on the Iterated Prisoner's Dilemma (IPD) and the Coin Game. They show that while the best response to POLA agents, approximated by Monte Carlo Tree Search (MCTS), does not fully cooperate, the best response to BRS agents is indeed full cooperation. The BRS agent also fully cooperates with itself, unlike the POLA agent.
Statistik
The paper does not contain any key metrics or important figures to support the author's key logics.
Citater
The paper does not contain any striking quotes supporting the author's key logics.