Core Concepts
Deep neural networks can universally approximate any continuous function and interpolate any dataset, as long as the activation function is locally integrable and not an affine function.
Abstract
The paper establishes several key results regarding the approximation and interpolation capabilities of deep neural networks:
Interpolation of Deep Neural Networks:
It is proven that in the overparametrized regime, deep neural networks with at least d neurons in each hidden layer can interpolate any dataset of d distinct points, as long as the activation function is locally integrable and not an affine function.
For shallow neural networks with polynomial activation functions, the interpolation property depends on the degree of the polynomial and the dimensionality of the input data.
Universal Approximation of Deep Neural Networks:
It is shown that the set of deep neural networks is dense in the space of continuous functions on Rp, if and only if the activation function is not an affine function.
The eigenspectrum of the Hessian of the loss function evaluated at the global minima is characterized, showing that it has d positive eigenvalues and n-d zero eigenvalues.
Convergence to Global Minima:
A practical probabilistic method is provided for finding the interpolation point by randomly initializing the input-to-hidden weights and optimizing the output layer weights.
This method is extended to deep neural networks with polynomial activation functions of low degree.
The results establish the universal approximation and interpolation capabilities of deep neural networks under general conditions on the activation function, providing theoretical guarantees for their effectiveness in practice.