Core Concepts
The authors prove new convergence rates for a generalized version of stochastic Nesterov acceleration under interpolation conditions. Their approach accelerates any stochastic gradient method that makes sufficient progress in expectation, and the proof applies to both convex and strongly convex functions.
Abstract
The key highlights and insights from the content are:
The authors introduce a generalized version of stochastic Nesterov acceleration that can be applied to any stochastic gradient method making sufficient progress in expectation.
They prove that under interpolation conditions, this generalized stochastic accelerated gradient descent (AGD) scheme can achieve faster convergence rates compared to standard stochastic gradient descent (SGD).
The authors' analysis uses the estimating sequences framework and shows that as long as the primal update (e.g., SGD) satisfies a sufficient progress condition, the generalized stochastic AGD can be accelerated.
Specializing the results to standard stochastic AGD, the authors show an improved dependence on the strong growth constant compared to prior work. This improvement can be larger than the square-root of the condition number.
The authors also extend their analysis to stochastic AGD with preconditioning, demonstrating that preconditioning can further speed up accelerated stochastic optimization when the stochastic gradients are well-conditioned in the preconditioner's norm.
The authors compare their convergence guarantees to existing results in the literature, highlighting settings where their stochastic AGD scheme achieves acceleration over standard SGD.