核心概念
Continuous-time q-learning introduces integrated q-functions for mean-field control problems, enabling efficient learning algorithms.
摘要
The paper explores q-learning in continuous time for mean-field control problems, introducing integrated q-functions. It contrasts with single-agent control problems and discusses the importance of test policies. The study reveals two distinct q-functions: integrated q-function and essential q-function. The weak martingale condition and test policy searching method are proposed for model-free learning algorithms. Examples in LQ control and beyond are used to illustrate the algorithms' performance. The content is structured as follows:
- Introduction to mean-field control problems and reinforcement learning.
- Problem formulation with strong and exploratory control.
- Soft Q-learning in discrete time and its application to mean-field control.
- Two continuous time q-functions: integrated q-function and essential q-function.
- Relationship between q-functions and learning algorithms.
- Conclusion and future research directions.
统计
"The optimal policy can be explicitly written by π∗(a|t, x) = exp{ 1/γ Q∗single(t, x, a)} / Σ_a exp{ 1/γ Q∗single(t, x, a)}"
"Q∗single(t, x, a) = r(t, x, a) + βE[γ log Σ_a exp{ 1/γ Q∗single(t + 1, x', a)} | Xt = x, at = a]"
"Q∗(t, µ, h) = E[ r(ξ, µ, ah) - γ log h(ah|t, ξ, µ) + β sup_h' Q∗(t + 1, Φ(t, µ, h), h') ]"
引用
"Inspired by Jia and Zhou (2023), we are particularly interested in whether, and if yes, how the continuous time q-learning can be applied in learning McKean-Vlasov control problems in the mean-field model with infinitely many interacting agents."
"The weak martingale condition and test policy searching method are proposed for model-free learning algorithms."
"The integrated q-function actually cannot be utilized directly to learn the optimal policy."