The paper explores the scaling dynamics of neural fields in relation to data size, investigating the number of parameters necessary for the neural architecture to facilitate gradient descent convergence to a global minimum. The key findings are:
For shallow networks employing sine, sinc, Gaussian, or wavelet activations and initialized with standard schemes like LeCun, Kaiming, or Xavier, the network parameters must scale super-linearly with the number of training samples for gradient descent to converge effectively.
For deep networks with the same activations and initializations, the network parameters need to scale super-quadratically.
The authors propose a novel initialization scheme that significantly reduces the required overparameterization compared to standard practical initializations. Specifically, for shallow networks, the proposed initialization requires only linear scaling, and for deep networks, it requires quadratic scaling.
The theoretical insights are validated through extensive experiments on various neural field applications, including image regression, super-resolution, shape reconstruction, tomographic reconstruction, novel view synthesis, and physical modeling.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문