Core Concepts
Stochastic Heavy Ball Method accelerates convergence with step decay scheduler on quadratic objectives under anisotropic gradient noise.
Abstract
ABSTRACT:
Heavy-ball momentum with decaying learning rates is effective for optimizing deep learning models.
Theoretical analysis fills the gap in understanding its properties under anisotropic gradient noise.
Accelerated convergence and near-optimal rates are achieved, beneficial for large-batch settings.
INTRODUCTION:
Optimization techniques for training large models are crucial.
Stochastic gradient descent (SGD) and variants like heavy-ball methods are widely used.
Empirical success of heavy-ball momentum contrasts with limited theoretical results for SGD.
DATA EXTRACTION:
"Heavy-ball momentum can provide ˜O(√κ) accelerated convergence."
"SGD requires at least Ω(κ) iterations to reduce excess risk by a factor of c."
EXPERIMENTS:
Ridge regression experiments show SHB outperforms SGD, especially with step decay schedule.
Image classification on CIFAR-10 demonstrates significant acceleration and performance improvement by SHB over SGD.
Stats
"Heavy-ball momentum can provide ˜O(√κ) accelerated convergence."
"SGD requires at least Ω(κ) iterations to reduce excess risk by a factor of c."
Quotes
"We fill this theoretical gap by establishing a non-asymptotic convergence bound for stochastic heavy-ball methods."
"Our paper gives a positive answer to this question."