The paper addresses the theoretical analysis of stochastic heavy-ball methods with decaying learning rates on quadratic objectives under anisotropic gradient noise conditions. It fills the gap in understanding the acceleration potential of heavy-ball momentum in large-batch settings. By introducing novel theoretical techniques, the paper establishes a non-asymptotic convergence bound for stochastic heavy-ball methods, demonstrating their superiority over traditional SGD. The results show that proper learning rate schedules can significantly speed up large-batch training, offering practical implications for distributed machine learning and federated learning scenarios.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Rui Pan,Yuxi... alle arxiv.org 03-19-2024
https://arxiv.org/pdf/2312.14567.pdfDomande più approfondite