Основні поняття
This work introduces novel information-directed sampling (IDS) algorithms that are proven to be sample-efficient for learning Nash equilibrium in multi-agent reinforcement learning settings, including two-player zero-sum Markov games and multi-player general-sum Markov games.
Анотація
This paper presents a set of novel algorithms based on the principle of information-directed sampling (IDS) for multi-agent reinforcement learning (MARL) problems. The key contributions are:
-
MAIDS Algorithm:
- Designed for two-player zero-sum Markov games (MGs)
- Employs an asymmetric learning structure where the max-player solves a minimax optimization problem based on the joint information ratio, and the min-player minimizes the marginal information ratio
- Achieves a Bayesian regret of Õ(√K) for K episodes
-
Reg-MAIDS Algorithm:
- An improved version of MAIDS with reduced computational complexity while maintaining the same Bayesian regret bound
-
Compressed-MAIDS Algorithm:
- Leverages the flexibility of IDS in choosing the learning target
- Constructs a compressed environment based on rate-distortion theory and uses it as the learning target
- Provides improved regret bounds compared to learning the full environment
-
Extension to Multi-Player General-Sum MGs:
- The Reg-MAIDS algorithm is extended to multi-player general-sum MGs
- Can learn either the Nash equilibrium or coarse correlated equilibrium in a sample-efficient manner
The key innovation is the application of the IDS principle to the competitive and cooperative multi-agent setting, which was previously unexplored. The algorithms are proven to achieve favorable sample efficiency, computational efficiency, and flexibility in choosing the learning target.