Conceitos Básicos
RAMDA is the first regularized adaptive method outputting models with outstanding predictive performance while being optimally structured.
Resumo
The article introduces the Regularized Adaptive Momentum Dual Averaging (RAMDA) algorithm for training structured neural networks. It proposes an inexactness condition and a subproblem solver to make regularized adaptive methods practically feasible. RAMDA guarantees locally optimal structures and outperforms existing methods in various tasks like computer vision, language modeling, and speech tasks.
- Introduction
- Challenge of large neural network models.
- Importance of exploiting structures to reduce costs.
- Regularized Adaptive Methods
- Inducing desirable structures with regularizers.
- Challenges in solving subproblems efficiently.
- Manifold Identification
- Theory behind identifying optimal structures.
- Algorithm Description
- RAMDA's approach to finding locally optimal structures.
- Related Work
- Comparison with existing regularized methods.
- Subproblem Solver Efficiency
- Effectiveness of the proposed subproblem solver.
- Experiments
- Results on image classification, language modeling, and speech synthesis tasks.
- Conclusions
- Superiority of RAMDA in achieving high structured sparsity and prediction performance.
Estatísticas
We propose an implementable inexactness condition that retains convergence guarantees similar to exact versions.
Extensive experiments show RAMDA consistently outperforms state-of-the-art methods.
Citações
"RAMDA is guaranteed to obtain the best structure possible among all methods converging to the same point."
"Extensive numerical experiments show that RAMDA is efficient and consistently outperforms state-of-the art."