Core Concepts
This work introduces novel information-directed sampling (IDS) algorithms that are proven to be sample-efficient for learning Nash equilibrium in multi-agent reinforcement learning settings, including two-player zero-sum Markov games and multi-player general-sum Markov games.
Abstract
This paper presents a set of novel algorithms based on the principle of information-directed sampling (IDS) for multi-agent reinforcement learning (MARL) problems. The key contributions are:
MAIDS Algorithm:
Designed for two-player zero-sum Markov games (MGs)
Employs an asymmetric learning structure where the max-player solves a minimax optimization problem based on the joint information ratio, and the min-player minimizes the marginal information ratio
Achieves a Bayesian regret of Õ(√K) for K episodes
Reg-MAIDS Algorithm:
An improved version of MAIDS with reduced computational complexity while maintaining the same Bayesian regret bound
Compressed-MAIDS Algorithm:
Leverages the flexibility of IDS in choosing the learning target
Constructs a compressed environment based on rate-distortion theory and uses it as the learning target
Provides improved regret bounds compared to learning the full environment
Extension to Multi-Player General-Sum MGs:
The Reg-MAIDS algorithm is extended to multi-player general-sum MGs
Can learn either the Nash equilibrium or coarse correlated equilibrium in a sample-efficient manner
The key innovation is the application of the IDS principle to the competitive and cooperative multi-agent setting, which was previously unexplored. The algorithms are proven to achieve favorable sample efficiency, computational efficiency, and flexibility in choosing the learning target.