核心概念
The authors investigate data corruption robustness in offline two-player zero-sum Markov games, proposing algorithms under different coverage and corruption scenarios to achieve near-optimal suboptimality gap bounds with respect to ϵ.
要約
The study focuses on learning approximate Nash Equilibrium policies in offline two-player zero-sum Markov games under various data corruption scenarios. The authors provide lower bounds, propose robust algorithms, and analyze the impact of coverage assumptions on learning outcomes.
The research addresses the challenges of adversarial attacks in multi-agent reinforcement learning, emphasizing the importance of coverage assumptions for successful policy learning. By introducing novel bonus terms and robust estimators, the study offers insights into mitigating data corruption effects.
Key points include:
- Investigating data corruption robustness in offline two-player zero-sum Markov games.
- Proposing algorithms under different coverage and corruption scenarios.
- Analyzing the impact of coverage assumptions on learning outcomes.
- Introducing novel bonus terms and robust estimators to mitigate data corruption effects.
統計
We prove an information-theoretic lower bound of Ω(Hdϵ) on the suboptimality gap.
RLS-PMVI achieves a suboptimality gap upper bounded by H(H + γ)2poly(d)κ2K + H(H + γ)κϵ.
SCRAM-PMVI returns a suboptimality gap upper bounded by 1/√c1(γ + H)H√dϵ + H2d√c1K.
SCRAM-PMVI with Assumption 4 yields a suboptimality gap upper bounded by 1/c1c2H2d3/2ϵ + H2d3/2c1c2√K.
引用
"We are the first to provide such a characterization of learning approximate Nash Equilibrium policies in offline two-player zero-sum Markov games under data corruption." - Andi Nika et al.