The paper addresses the theoretical analysis of stochastic heavy-ball methods with decaying learning rates on quadratic objectives under anisotropic gradient noise conditions. It fills the gap in understanding the acceleration potential of heavy-ball momentum in large-batch settings. By introducing novel theoretical techniques, the paper establishes a non-asymptotic convergence bound for stochastic heavy-ball methods, demonstrating their superiority over traditional SGD. The results show that proper learning rate schedules can significantly speed up large-batch training, offering practical implications for distributed machine learning and federated learning scenarios.
翻譯成其他語言
從原文內容
arxiv.org
深入探究