Core Concepts
Multi-layer neural networks with randomly initialized weights can approximate functions that belong to the reproducing kernel Hilbert space (RKHS) defined by the Neural Network Gaussian Process (NNGP) kernel. The number of neurons required to achieve a certain approximation error is determined by the RKHS norm of the target function.
Abstract
The paper investigates the approximation power of multi-layer feedforward neural networks (NNs) with randomly initialized weights. It shows that in the infinite width limit, such NNs behave like a Gaussian Random Field whose covariance function is the Neural Network Gaussian Process (NNGP) kernel.
The key insights are:
The RKHS defined by the NNGP kernel contains only functions that can be well approximated by the NN architecture.
To achieve a certain approximation error, the required number of neurons in each layer is defined by the RKHS norm of the target function.
The approximation can be constructed from a supervised dataset by a random multi-layer representation of the input vector, together with training of the last layer's weights.
For a 2-layer NN and a domain equal to an (n-1)-dimensional sphere in Rn, the paper compares the number of neurons required by Barron's theorem and the multi-layer features construction. It shows that if the eigenvalues of the integral operator of the NNGP decay slower than k^(-n-2/3), then the multi-layer features construction guarantees a more succinct neural network approximation than Barron's theorem.
Computational experiments verify the theoretical findings and show that realistic neural networks can easily learn target functions even when the theorems do not provide any guarantees.
Stats
There are no key metrics or important figures used to support the author's key logics.
Quotes
There are no striking quotes supporting the author's key logics.