Exploring the Pivotal Role of Initial Scale in Governing the Training Dynamics of Overparameterized Neural Networks
The initial scale of the output function κ plays a pivotal role in governing the training dynamics of overparameterized neural networks, enabling rapid convergence to zero training loss irrespective of the specific initialization schemes employed.