Core Concepts
Theoretical insights reveal a deep connection between network initialization, architectural choices, and the optimization process, emphasizing the need for a holistic approach when designing parameter-efficient neural fields.
Abstract
The paper explores the scaling dynamics of neural fields in relation to data size, investigating the number of parameters necessary for the neural architecture to facilitate gradient descent convergence to a global minimum. The key findings are:
For shallow networks employing sine, sinc, Gaussian, or wavelet activations and initialized with standard schemes like LeCun, Kaiming, or Xavier, the network parameters must scale super-linearly with the number of training samples for gradient descent to converge effectively.
For deep networks with the same activations and initializations, the network parameters need to scale super-quadratically.
The authors propose a novel initialization scheme that significantly reduces the required overparameterization compared to standard practical initializations. Specifically, for shallow networks, the proposed initialization requires only linear scaling, and for deep networks, it requires quadratic scaling.
The theoretical insights are validated through extensive experiments on various neural field applications, including image regression, super-resolution, shape reconstruction, tomographic reconstruction, novel view synthesis, and physical modeling.
Stats
The paper does not provide specific numerical data or metrics, but rather focuses on theoretical scaling laws and their empirical validation.
Quotes
"Theoretical results often find themselves in regimes that may not align with practical applications, rendering the theory insightful but lacking in predictiveness. To underscore the predictive efficacy of our theoretical framework, we design a novel initialization scheme and demonstrate its superior optimality compared to standard practices in the literature."
"Our findings reveal that neural fields, equipped with the activations sine, sinc, Gaussian, or wavelet and initialized using our proposed scheme, require a linear scaling with data in the shallow case and a quadratic scaling in the deep case, for gradient descent to converge to a global optimum."