Core Concepts

A new normal map-based proximal random reshuffling (norm-PRR) method is proposed for solving nonsmooth nonconvex finite-sum optimization problems. Norm-PRR achieves improved iteration complexity bounds compared to existing proximal-type random reshuffling methods, and also exhibits strong asymptotic convergence guarantees.

Abstract

The paper presents a new proximal random reshuffling algorithm called norm-PRR for solving nonsmooth nonconvex finite-sum optimization problems. The key contributions are:
Complexity Analysis:
Norm-PRR achieves an iteration complexity of O(n^(-1/3)T^(-2/3)) in expectation, improving over the currently known bounds for this class of problems.
Norm-PRR also has a deterministic complexity bound of O(T^(-2/3)).
These complexity results match the best known bounds for random reshuffling methods in the smooth nonconvex setting.
Asymptotic Convergence:
Under suitable step size conditions, norm-PRR is shown to converge globally, with the stationarity measure dist(0, ∂ψ(wk)) converging to 0 and the objective function ψ(wk) converging to an optimal value.
For diminishing step sizes, the whole sequence of iterates {wk} is proven to converge to a single stationary point.
Quantitative asymptotic convergence rates are derived that can match those in the smooth, strongly convex setting.
Numerical Experiments:
Experiments on nonconvex classification tasks demonstrate the efficiency of the proposed norm-PRR approach.
The key innovation of norm-PRR is the use of the normal map, which allows for better compatibility with without-replacement sampling schemes compared to existing proximal-type random reshuffling methods.

Stats

∥∇f(w, i)∥^2 ≤ 2L[f(w, i) - f_lb]
σ^2_k ≤ 2L[ψ(wk) - ψ_lb]

Quotes

"Random reshuffling techniques are prevalent in large-scale applications, such as training neural networks. While the convergence and acceleration effects of random reshuffling-type methods are fairly well understood in the smooth setting, much less studies seem available in the nonsmooth case."
"We show that norm-PRR achieves the iteration complexity O(n^(-1/3)T^(-2/3)) where n denotes the number of component functions f(·, i) and T counts the total number of iterations. This improves the currently known complexity bounds for this class of problems by a factor of n^(-1/3)."
"Moreover, we derive last iterate convergence rates that can match those in the smooth, strongly convex setting."

Key Insights Distilled From

by Junwen Qiu,X... at **arxiv.org** 05-01-2024

Deeper Inquiries

The norm-PRR method can be extended to handle stochastic constraints or more general composite structures beyond the finite-sum form by incorporating additional terms in the update rule that account for the stochastic nature of the constraints. One approach could be to introduce a regularization term that penalizes violations of the stochastic constraints, similar to how the weakly convex function φ penalizes deviations from the desired solution in the current formulation. By modifying the update rule to include terms that ensure compliance with the stochastic constraints, the norm-PRR method can be adapted to handle a wider range of optimization problems with stochastic elements.

One potential limitation of the normal map-based approach compared to other proximal-type random reshuffling methods is the reliance on the normal map as a stationarity measure. While the normal map provides a useful tool for analyzing convergence properties and establishing descent guarantees, it may introduce additional computational complexity in practice. Calculating the normal map at each iteration can be computationally expensive, especially in high-dimensional optimization problems with complex structures. Additionally, the normal map may not always accurately capture the local geometry of the objective function, leading to suboptimal convergence behavior in certain scenarios.

To strengthen the asymptotic convergence analysis of norm-PRR and establish non-asymptotic linear convergence rates under suitable geometric conditions, one approach could be to refine the analysis of the error terms and their impact on the convergence properties of the algorithm. By conducting a more detailed investigation of the error bounds and their relationship to the convergence rates, it may be possible to derive tighter bounds that guarantee non-asymptotic linear convergence under specific geometric conditions. Additionally, exploring the interplay between the step sizes, variance terms, and the geometry of the objective function could provide insights into how to optimize the algorithm for faster convergence rates in practical settings.

0