Concepts de base
Improving the Adam optimizer through IMEX methods enhances neural network training.
Résumé
The content discusses enhancing the Adam optimizer using Implicit-Explicit (IMEX) time-stepping approaches. It explores the underlying ordinary differential equation of Adam, proposes new optimization algorithms based on higher order IMEX methods, and presents results from various experiments in regression, classification, and deep learning tasks.
Computational Science Laboratory Report:
- Authors: Abhinab Bhattacharjee, Andrey A. Popov, Arash Sarshar, Adrian Sandu.
- Affiliation: Virginia Tech's Computational Science Lab.
- Abstract: Proposes new extensions of the Adam scheme using higher order IMEX methods for improved neural network training.
Introduction:
- Neural network training involves solving unconstrained optimization problems.
- Stochastic Gradient Descent (SGD) and momentum methods are discussed.
- First-order adaptive optimization methods like Adam are introduced.
Background:
- Performance of Optimization for NN Training is detailed.
- Optimization in Continuous and Discrete Setting is explained.
Proposed Methodology:
- IMEX GARK Time Integration method is introduced for solving ODEs split into stiff and non-stiff parts.
- Adam algorithm is demonstrated as an IMEX Euler discretization of the underlying ODE.
Numerical Experiments:
- Lorenz '63 Dynamics: Higher order Runge-Kutta methods outperform lower order ones in regression tasks.
- Sum of Gaussians: Promising results with traditional higher order Runge-Kutta methods in nonlinear regression problems.
- Spiral Dataset: Higher order RK method shows advantages over first-order RK method in classification tasks with deep architectures.
- MNIST Dataset: New Trapezoidal method performs similarly to standard Adam in classification tasks on handwritten digits dataset.
- CIFAR10 with VGG: No significant advantage observed with the new Trapezoidal method compared to standard Adam in image classification tasks.
Stats
Adam is a first-order Generalized Additive Runge-Kutta (GARK) IMEX Euler discretization of the underlying ODE.
Citations
"The novel contributions include interpreting Adam as a GARK IMEX Euler discretization" - Authors