The key insights from the content are:
Gradient-based learning in multi-layer neural networks displays striking features, including non-monotone decrease in empirical risk, long plateaus followed by rapid decreases, and separation of timescales in the learning process.
The authors study the gradient flow dynamics of a wide two-layer neural network in high-dimensional settings, where the target function depends on a one-dimensional projection of the input features (single-index model).
Through a mixture of rigorous results, non-rigorous derivations, and numerical simulations, the authors propose a scenario for the learning dynamics in this setting. The proposed evolution exhibits separation of timescales and intermittency, which arise naturally because the population gradient flow can be recast as a singularly perturbed dynamical system.
The authors show that the learning dynamics constructs a sequence of polynomial approximations of the target function, with each phase of learning corresponding to a more accurate polynomial approximation. This incremental learning behavior is formalized as the "canonical learning order".
The authors provide a detailed analysis of the gradient flow dynamics using tools from dynamical systems theory, including singular perturbation theory and matched asymptotic expansions. They are able to prove the proposed scenario in several special cases and provide a heuristic argument for its generality.
The authors also discuss the implications of their findings for understanding generalization in deep learning, as the notion of complexity corresponds to the order in which the solution space is explored.
To Another Language
from source content
arxiv.org
Deeper Inquiries