The paper addresses the theoretical analysis of stochastic heavy-ball methods with decaying learning rates on quadratic objectives under anisotropic gradient noise conditions. It fills the gap in understanding the acceleration potential of heavy-ball momentum in large-batch settings. By introducing novel theoretical techniques, the paper establishes a non-asymptotic convergence bound for stochastic heavy-ball methods, demonstrating their superiority over traditional SGD. The results show that proper learning rate schedules can significantly speed up large-batch training, offering practical implications for distributed machine learning and federated learning scenarios.
إلى لغة أخرى
من محتوى المصدر
arxiv.org
الرؤى الأساسية المستخلصة من
by Rui Pan,Yuxi... في arxiv.org 03-19-2024
https://arxiv.org/pdf/2312.14567.pdfاستفسارات أعمق