The Neural Feature Ansatz (NFA), which observes a correlation between the Gram matrix of weights (NFM) and the average gradient outer product (AGOP) in trained neural networks, emerges from the alignment of weight matrices with pre-activation tangent kernel features driven by gradient descent.
This research paper demonstrates that gradient descent, when applied to two-layer neural networks with a carefully regularized objective, can learn useful features not only in the early stages of training but also in the later stages, leading to convergence to the ground-truth features.