The early period of training significantly impacts out-of-distribution generalization in neural networks.
The training process of artificial neural networks can be interpreted as a dynamical system in graph space, exhibiting complex behaviors such as regular and chaotic dynamics depending on the learning rate regime.