Conceitos Básicos
RL-CFR introduces dynamic action abstraction in IIEFGs, improving performance without increased solving time.
Resumo
RL-CFR presents a novel approach to dynamic action abstraction in Imperfect Information Extensive-Form Games (IIEFGs). Existing methods rely on fixed abstractions, leading to sub-optimal performance due to computational complexity. RL-CFR utilizes reinforcement learning and counterfactual regret minimization for strategy derivation. It constructs a game tree with RL-guided action abstractions, achieving higher expected payoff without increased solving time. In experiments on Heads-up No-limit Texas Hold’em, RL-CFR outperforms existing methods significantly.
RL-CFR combines Deep Reinforcement Learning with Counterfactual Regret Minimization, addressing challenges of mixed strategies and probabilistic rewards in IIEFGs. The framework integrates self-training processes for dynamic action abstraction selection. Training involves compressing PBS, selecting actions through noise-added networks, building depth-limited subgames, and calculating rewards based on PBS values.
The experiments demonstrate RL-CFR's superiority over fixed action abstraction methods like ReBeL's replication and Slumbot in HUNL poker games. The framework achieves substantial win-rate margins of 64 ± 11 and 84 ± 17 mbb/hand against existing algorithms. Additionally, exploitability evaluations show that RL-CFR generates less exploitable strategies compared to other methods.
Estatísticas
RL-CFR achieves significant win-rate margins of 64 ± 11 and 84 ± 17 mbb/hand in HUNL experiments.
The exploitability of RL-CFR is measured at 17 ± 0 mbb/hand.
After training for over 5 × 105 epochs, the PBS value network comprises approximately 18 million parameters.