Core Concepts
The authors present a unified reinforcement learning algorithm that can solve both mean field game and mean field control problems in continuous state and action spaces.
Abstract
The paper introduces an infinite horizon mean field actor-critic (IH-MF-AC) algorithm that can efficiently solve continuous-space mean field games (MFG) and mean field control (MFC) problems. The key contributions are:
The algorithm uses an actor-critic framework to learn the optimal control policy and value function, while simultaneously learning the representation of the mean field distribution via a parameterized score function. This allows the algorithm to handle continuous state and action spaces.
The algorithm can converge to either the MFG equilibrium or the MFC optimum by adjusting the relative learning rates of the actor, critic, and mean field distribution components. This unifies the treatment of MFG and MFC problems.
The mean field distribution is represented using a parameterized score function, which is updated via score matching techniques. This allows efficient sampling from the mean field distribution using Langevin dynamics.
The paper first reviews the mathematical formulation of infinite horizon MFG and MFC problems. It then describes the reinforcement learning background, including temporal difference methods and actor-critic algorithms. The IH-MF-AC algorithm is then presented, detailing the updates for the actor, critic, and mean field score function. The authors provide intuition on how the relative learning rates can be used to converge to either the MFG or MFC solution.
Finally, the algorithm is evaluated on a linear-quadratic benchmark problem, where the explicit solutions for MFG and MFC are known. The numerical results demonstrate the ability of the algorithm to converge to the correct solutions by adjusting the learning rates.
Stats
The state dynamics are given by the stochastic differential equation:
dXt = αt dt + σ dWt
The running cost function to be minimized is:
1/2 α^2_t + c1 (Xt - c2 m)^2 + c3 (Xt - c4)^2 + c5 m^2
where m = ∫ x μ(dx) is the first moment of the mean field distribution μ.
Quotes
"The proposed approach pairs the actor-critic (AC) paradigm with a representation of the mean field distribution via a parameterized score function, which can be efficiently updated in an online fashion, and uses Langevin dynamics to obtain samples from the resulting distribution."
"The AC agent and the score function are updated iteratively to converge, either to the MFG equilibrium or the MFC optimum for a given mean field problem, depending on the choice of learning rates."