The author explains how spikes in training loss during SGD are caused by catapult dynamics in the top eigenspace of the tangent kernel. Furthermore, they demonstrate that these catapults lead to better generalization through increased alignment with the Average Gradient Outer Product (AGOP).
Catapults in SGD führen zu besserer Generalisierung durch Feature-Learning.