Belangrijkste concepten
Shaped neural networks with activation functions scaled as the network size grows, and unshaped neural networks with unchanged activation functions, both have differential equation-based asymptotic characterizations.
Samenvatting
The content discusses the scaling limits of shaped and unshaped neural networks.
Key highlights:
Shaped neural networks with activation functions scaled as the network size grows have a differential equation-based asymptotic characterization, described by a Neural Covariance stochastic differential equation (SDE).
Unshaped neural networks, where the activation function is unchanged as the network size grows, also have a similar differential equation-based asymptotic characterization.
For a fully connected ResNet with a d^(-1/2) factor on the residual branch, and a multilayer perceptron (MLP) with depth d ≪ width n and shaped ReLU activation at rate d^(-1/2), the two architectures converge to the same infinite-depth-and-width limit at initialization.
For an unshaped MLP at initialization, the authors derive the first-order asymptotic correction to the layerwise correlation, which is closely approximated by an SDE.
These results provide a connection between shaped and unshaped network architectures, and open up the possibility of studying the effect of normalization methods and how they connect with shaping activation functions.