Core Concepts
Neural networks can be mathematically modeled using convection-diffusion equations, providing a unified framework for understanding and improving network structures.
Abstract
The paper explores the use of convection-diffusion equations to model neural networks.
It discusses the theoretical foundation and practical applications of this framework.
Various methods, such as Gaussian noise injection, dropout of hidden units, and randomized smoothing, are interpreted within this framework.
Comparison with scale-space theory assumptions is provided to highlight similarities and differences.
Experimental results in different domains showcase the effectiveness of the convection-diffusion approach.
Stats
For any T > 0, by introducing a temporal partition ∆t = T/L, the residual block represented by (1) can be viewed as the explicit Euler discretization with time step ∆t for the following ordinary differential equation (ODE): dx(t)/dt = v(x(t), t), x(0) = x0, t ∈[0, T].
Furthermore, the connection between ODEs and partial differential equations (PDEs) through the well-known characteristics method has motivated the analysis of ResNets from a PDE perspective. This includes theoretical analysis Sonoda et al. [2019], novel training algorithms Sun Qi and Qiang [2020], and improvements in adversarial robustness Wang et al. [2020a] for NNs.
The method of characteristics tells us that, along the curve defined by (2), the function value u(x, t) remains unchanged.
Quotes
"NN can be viewed as the image u(·, t) of a mapping driven by a certain PDE." - Content Source