This research paper introduces a novel approach to enhance the Stochastic Cubic Newton (SCN) method for non-convex optimization by incorporating a specific type of momentum. The authors address a critical limitation of existing SCN methods, which struggle to converge for small batch sizes due to noise in gradient and Hessian estimates.
The paper highlights the challenge of controlling noise in second-order methods, particularly the (3/2)-th moment of gradient noise and the third moment of Hessian noise. The authors propose using a combination of Implicit "Gradient" Transport (IT) momentum for gradient estimates and Heavy Ball (HB) momentum for Hessian estimates. This approach effectively simulates large batches by reusing past estimates, thereby reducing the impact of noise.
The theoretical analysis demonstrates that this momentum-based SCN method achieves improved convergence rates compared to traditional SCN methods, particularly for small batch sizes. Notably, the method guarantees convergence for any batch size, even when using only one sample per iteration. This breakthrough addresses a significant gap between first-order and second-order stochastic optimization methods.
The paper also presents an extension of the momentum-based SCN method to convex optimization problems, demonstrating similar improvements in convergence rates. The authors suggest that this momentum technique could be combined with acceleration methods to further enhance performance in the convex case.
The practical significance of the proposed method is validated through experiments on logistic regression with non-convex regularization using the A9A and MNIST datasets. The results confirm that incorporating momentum into SCN leads to faster convergence and reduced variance compared to standard SCN and Stochastic Gradient Descent (SGD).
The authors acknowledge the need for adaptive strategies to optimize the momentum hyperparameters in future research. Overall, this paper makes a valuable contribution to the field of stochastic optimization by demonstrating the potential of momentum-based techniques for improving the efficiency and effectiveness of second-order methods, particularly in the context of large-scale machine learning applications.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by El Mahdi Cha... klokken arxiv.org 10-28-2024
https://arxiv.org/pdf/2410.19644.pdfDypere Spørsmål