Half-Space Feature Learning Dynamics in Neural Networks
Neural networks can learn non-linear features that are effectively indicator functions for regions compactly described as intersections of half-spaces in the input space. This feature learning happens early in training and the dynamics of gradient descent impart a distinct clustering to the later layer neurons.