Unveiling the Implicit Bias in Next-Token Prediction Training Paradigm
The author explores the implicit bias present in next-token prediction training paradigms, focusing on the structural properties of weights and their convergence towards specific solutions.