Core Concepts
A new normal map-based proximal random reshuffling (norm-PRR) method is proposed for solving nonsmooth nonconvex finite-sum optimization problems. Norm-PRR achieves improved iteration complexity bounds compared to existing proximal-type random reshuffling methods, and also exhibits strong asymptotic convergence guarantees.
Abstract
The paper presents a new proximal random reshuffling algorithm called norm-PRR for solving nonsmooth nonconvex finite-sum optimization problems. The key contributions are:
-
Complexity Analysis:
- Norm-PRR achieves an iteration complexity of O(n^(-1/3)T^(-2/3)) in expectation, improving over the currently known bounds for this class of problems.
- Norm-PRR also has a deterministic complexity bound of O(T^(-2/3)).
- These complexity results match the best known bounds for random reshuffling methods in the smooth nonconvex setting.
-
Asymptotic Convergence:
- Under suitable step size conditions, norm-PRR is shown to converge globally, with the stationarity measure dist(0, ∂ψ(wk)) converging to 0 and the objective function ψ(wk) converging to an optimal value.
- For diminishing step sizes, the whole sequence of iterates {wk} is proven to converge to a single stationary point.
- Quantitative asymptotic convergence rates are derived that can match those in the smooth, strongly convex setting.
-
Numerical Experiments:
- Experiments on nonconvex classification tasks demonstrate the efficiency of the proposed norm-PRR approach.
The key innovation of norm-PRR is the use of the normal map, which allows for better compatibility with without-replacement sampling schemes compared to existing proximal-type random reshuffling methods.
Stats
∥∇f(w, i)∥^2 ≤ 2L[f(w, i) - f_lb]
σ^2_k ≤ 2L[ψ(wk) - ψ_lb]
Quotes
"Random reshuffling techniques are prevalent in large-scale applications, such as training neural networks. While the convergence and acceleration effects of random reshuffling-type methods are fairly well understood in the smooth setting, much less studies seem available in the nonsmooth case."
"We show that norm-PRR achieves the iteration complexity O(n^(-1/3)T^(-2/3)) where n denotes the number of component functions f(·, i) and T counts the total number of iterations. This improves the currently known complexity bounds for this class of problems by a factor of n^(-1/3)."
"Moreover, we derive last iterate convergence rates that can match those in the smooth, strongly convex setting."