This research paper proves the almost sure convergence of a class of stochastic optimization algorithms, inspired by Hamiltonian dynamics, to stationary points of an objective function under various smoothness and noise conditions, including L-smoothness, (L0, L1)-smoothness, and heavy-tailed noise.
本文提出了一種名為 SMAG 的單迴圈隨機演算法,用於解決一類非光滑、非凸的 DMax 優化問題,並在理論上證明了其具有與當前最佳演算法相同的非漸進收斂速度。
This paper introduces SMAG, a novel single-loop stochastic algorithm designed to efficiently solve a class of non-smooth, non-convex optimization problems, specifically focusing on the difference of weakly convex functions with a max-structure (DMax).
Incorporating a specific type of momentum into the Stochastic Cubic Newton method significantly improves its convergence rate for non-convex optimization problems, enabling convergence for any batch size, including single-sample batches.
This paper introduces novel sign-based stochastic variance reduction algorithms for non-convex optimization, achieving improved convergence rates compared to existing sign-based methods, both in centralized and distributed settings.
This research paper establishes a quantitative error estimate between Stochastic Gradient Descent with Momentum (SGDm) and Underdamped Langevin Diffusion in terms of 1-Wasserstein and total variation distances, demonstrating the close relationship between these two optimization methods.
This paper proposes and analyzes two novel consensus-based particle swarm optimization algorithms for solving stochastic optimization problems, leveraging mean-field approximations to establish their theoretical foundations and convergence properties.
本文提出了一種名為 FINDER 的新型隨機優化器,它結合了基於隨機濾波的擬牛頓法更新,並針對大規模問題進行了優化,展現出在深度學習訓練和其他領域的應用潛力。
This paper introduces FINDER, a novel stochastic optimizer that combines the advantages of quasi-Newton methods and noise-assisted global search, demonstrating its effectiveness in high-dimensional optimization problems, including deep network training.
Normalized SGD (NSGD) is a robust and efficient optimization algorithm for machine learning problems with heavy-tailed gradient noise, achieving optimal sample complexity without requiring the complex tuning of gradient clipping methods.