Core Concepts
Even without knowing the game matrix or observing any payoffs, it is possible to exploit a wide variety of deterministic behavioral biases exhibited by an opponent to win nearly every round in a symmetric zero-sum game.
Abstract
The paper considers symmetric, repeated, two-player, zero-sum games where the player does not know the game matrix or observe any payoffs, but can observe the opponent's actions. It models several deterministic, behaviorally-biased opponents and shows how to exploit each bias to win nearly every round.
Key highlights:
For the Myopic Best Responder opponent, the player can learn best responses to each action and then predict and play the best response to the opponent's predicted action to win every round after the first n+1 rounds.
For the Gambler's Fallacy opponent, the player can learn best responses to the opponent's "most overdue" action and then force the opponent to play that action to win every round after the first 3n rounds.
For the Win-Stay Lose-Shift opponent (with variants for how ties are treated), the player can learn the opponent's action ordering and best responses to win all but a bounded number of rounds.
For the Follow-the-Leader opponent, the player can use the ellipsoid algorithm to estimate the game matrix and then play best responses to the predicted actions to win all but a bounded number of rounds.
The paper also provides a partial characterization of the kinds of behavioral strategies that can be exploited to win nearly every round, and shows that in some cases, the player can win nearly every round against a biased opponent even if they do not know which behavioral strategy the opponent is using.