The paper considers the nonconvex stochastic optimization problem of minimizing the expected risk u(θ) = E[U(θ, X)], where U is a measurable function and X is an m-dimensional random variable. The authors focus on an alternative method to tackle the problem of sampling from the target distribution πβ, which is the unique invariant measure of the Langevin stochastic differential equation.
The authors introduce the stochastic gradient Hamiltonian Monte Carlo (SGHMC) algorithm, which can be viewed as the analogue of stochastic gradient methods augmented with momentum. Crucially, the authors allow the stochastic gradient H to be discontinuous, satisfying only a continuity in average condition.
Under this setting, the authors provide non-asymptotic results in Wasserstein-1 and Wasserstein-2 distance between the law of the n-th iterate of the SGHMC algorithm and the target distribution. This further allows them to provide a non-asymptotic upper bound for the expected excess risk of the associated optimization problem.
To illustrate the applicability of their results, the authors present three key examples in statistical machine learning: quantile estimation, optimization involving ReLU neural networks, and hedging under asymmetric risk, where the corresponding functions H are discontinuous. Numerical results support the theoretical findings and demonstrate the superiority of the SGHMC algorithm over its SGLD counterpart.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Luxu Liang, ... at arxiv.org 09-26-2024
https://arxiv.org/pdf/2409.17107.pdfDeeper Inquiries