Core Concepts
The core message of this article is to present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity, enabling fully differentiable and approximation-free optimization that is compatible with gradient descent paradigms in deep learning.
Abstract
The article presents a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. These non-smooth and possibly non-convex problems typically rely on solvers tailored to specific models and regularizers. In contrast, the proposed method enables fully differentiable and approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning.
The optimization transfer comprises an overparametrization of selected parameters and a change of penalties. In the overparametrized problem, smooth surrogate regularization induces non-smooth, sparse regularization in the base parametrization. The authors prove that the surrogate objective is equivalent in the sense that it not only has identical global minima but also matching local minima, thereby avoiding the introduction of spurious solutions.
Additionally, the theory establishes results of independent interest regarding matching local minima for arbitrary, potentially unregularized, objectives. The authors comprehensively review sparsity-inducing parametrizations across different fields that are covered by their general theory, extend their scope, and propose improvements in several aspects. Numerical experiments further demonstrate the correctness and effectiveness of the approach on several sparse learning problems ranging from high-dimensional regression to sparse neural network training.