Core Concepts
Quantitative bounds on the Wasserstein distance between the output of wide random neural networks and the corresponding Gaussian random field.
Abstract
The paper derives upper bounds on the Wasserstein distance (W1), with respect to the sup-norm, between any continuous Rd-valued random field indexed by the n-sphere and the corresponding Gaussian random field. This is achieved by developing a novel Gaussian smoothing technique based on covariance functions constructed using powers of Laplacian operators.
The authors then apply this general result to obtain the first bounds on the Gaussian random field approximation of wide random neural networks of any depth and with Lipschitz activation functions. The bounds are explicitly expressed in terms of the widths of the network and moments of the random weights. Improved bounds are also obtained when the activation function has three bounded derivatives.
The key steps are:
Construct a Gaussian smoothing random field S with an explicit Cameron-Martin space, using the Laplacian operator on the n-sphere.
Develop a general Stein's method based approach to bound the W1 distance between a random field F and a Gaussian random field G, by first bounding the distance in a smoother metric dF and then transferring the bound to W1 using the Gaussian smoothing.
Apply the general result to wide random neural networks, bounding the W1 distance between the network output F and the corresponding Gaussian random field G. This involves a recursive application of the Stein's method approximation, combined with the Gaussian smoothing.
The results provide quantitative information about when the limiting Gaussian behavior of wide random neural networks "kicks in", without any restrictions on the depth, activation function, or weight distributions (beyond moment conditions).