Core Concepts
Welfare Equilibria (WE) provide a generalization of Stackelberg strategies that can recover desirable Nash Equilibria in non-coincidental games, where the Stackelberg strategy profile fails. The Welfare Function Search (WelFuSe) algorithm adaptively chooses an appropriate welfare function to avoid catastrophe in self-play while preserving performance against naive learning opponents.
Abstract
The content discusses the challenges of learning in multi-agent systems, where agents may have misaligned incentives and the environment is non-stationary. Opponent shaping (OS) approaches, which explicitly consider the opponent's incentives and behavior, have been proposed to address these challenges.
The authors first show that the Stackelberg strategy profile, in which both players choose Stackelberg strategies, represents a sensible solution concept in many two-player games. They then demonstrate that several existing OS algorithms can be derived as approximations to Stackelberg strategies.
However, the authors identify a class of "non-coincidental games" in which the Stackelberg strategy profile is not a Nash Equilibrium (NE). This includes several canonical matrix games, such as the Chicken Game, where the Stackelberg strategy profile leads to catastrophic outcomes in self-play.
To address this issue, the authors introduce Welfare Equilibria (WE) as a generalization of Stackelberg strategies. WE allows each player to choose a welfare function, which is then maximized while assuming the opponent plays a best response. The authors show that appropriate choices of welfare functions, such as egalitarian or fairness-based functions, can recover desirable NE solutions in non-coincidental games.
Finally, the authors present Welfare Function Search (WelFuSe), a practical algorithm that adaptively chooses the best welfare function from a predefined set, based on experience. WelFuSe is able to preserve performance against naive learning opponents while avoiding catastrophe in self-play, by learning to select non-greedy welfare functions when appropriate.
Stats
The content does not contain any key metrics or important figures to support the author's key logics.
Quotes
The content does not contain any striking quotes supporting the author's key logics.