The paper addresses the theoretical analysis of stochastic heavy-ball methods with decaying learning rates on quadratic objectives under anisotropic gradient noise conditions. It fills the gap in understanding the acceleration potential of heavy-ball momentum in large-batch settings. By introducing novel theoretical techniques, the paper establishes a non-asymptotic convergence bound for stochastic heavy-ball methods, demonstrating their superiority over traditional SGD. The results show that proper learning rate schedules can significantly speed up large-batch training, offering practical implications for distributed machine learning and federated learning scenarios.
To Another Language
from source content
arxiv.org
Głębsze pytania