المفاهيم الأساسية
This work proposes a unified framework for analyzing a broad class of Markov chains, called Ito chains, which can model various sampling, optimization, and boosting algorithms. The authors provide bounds on the discretization error between the Ito chain and the corresponding Ito diffusion in the W2 distance, under weak and general assumptions on the chain's terms, including non-Gaussian and state-dependent noise.
الملخص
The paper considers a general class of Markov chains called Ito chains, which can model a wide range of algorithms and techniques, including Langevin dynamics, stochastic gradient descent, and gradient boosting. The key contributions are:
Universality of the Ito chain equation: The authors show that the Ito chain equation can be used to describe various sampling, optimization, and boosting methods, providing a unified framework for analysis.
Weak and broad assumptions: The authors make relatively weak assumptions on the chain's terms, including non-Gaussian and state-dependent noise, as well as non-convex and non-dissipative generators.
Discretization error bounds: The authors provide bounds on the W2 distance between the laws of the Ito chain and the corresponding Ito diffusion. These bounds improve or cover most of the known estimates in the literature, and in some cases, the analysis is the first of its kind.
The paper first constructs an auxiliary chain with Gaussian noise that approximates the original non-Gaussian Ito chain. It then relates this auxiliary chain to the target Ito diffusion using a new version of the Girsanov theorem for mixed Ito/adapted coefficients. Finally, it connects the KL divergence between the diffusions to the W2 distance using an exponential integrability result.
The obtained bounds on the discretization error are expressed in terms of the chain's parameters, such as the Lipschitz constants, the noise properties, and the chain's initial condition. The results cover a wide range of special cases, including Langevin dynamics, stochastic gradient descent, and gradient boosting algorithms.