핵심 개념
Proposing PSRL-ZSG algorithm for zero-sum stochastic games with an arbitrary opponent, achieving a Bayesian regret bound of eO(HS√AT).
초록
Recent advances in competitive reinforcement learning.
Self-play algorithms in reinforcement learning.
PSRL-ZSG algorithm for zero-sum stochastic games.
Analysis of regret bounds and comparison with existing algorithms.
Theoretical understanding of multi-agent reinforcement learning.
통계
PSRL-ZSG 알고리즘은 eO(HS√AT)의 베이지안 후회 한계를 달성합니다.
인용구
"PSRL-ZSG algorithm achieves a Bayesian regret bound of eO(HS√AT)."