Основные понятия
Stochastic approximation algorithms, such as stochastic gradient descent and temporal difference learning, can be analyzed by relating the discrete stochastic iterates to the trajectories of an ordinary differential equation (ODE). A key challenge is establishing the stability of the stochastic iterates, which is necessary to connect the discrete and continuous dynamics. This paper extends the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the more general Markovian noise setting, significantly improving the applicability in reinforcement learning.
Аннотация
The paper presents a stability analysis for stochastic approximation algorithms with Markovian noise. The key contributions are:
-
Extending the Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, with weaker assumptions than prior work.
-
Establishing stability under two sets of assumptions:
- Assumption 6 based on a form of the strong law of large numbers
- Assumption 6' based on the Lyapunov drift condition (V4) and boundedness in L2
-
Demonstrating the wide applicability of the results, especially in off-policy reinforcement learning algorithms with linear function approximation and eligibility traces.
The analysis centers around the diminishing asymptotic rate of change of certain functions, which is implied by both the strong law of large numbers and the Lyapunov drift condition. This allows establishing the stability of the stochastic iterates without requiring the stronger assumptions needed in prior work.
Цитаты
"Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lyapunov drift condition and trivially holds if the Markov chain is finite and irreducible."