Keskeiset käsitteet
High-level semantic concepts are encoded linearly in large language models due to the next token prediction objective and the implicit bias of gradient descent.
Tiivistelmä
Recent works argue that high-level semantic concepts are encoded linearly in large language models.
Linear representations are promoted by the next token prediction objective and the implicit bias of gradient descent.
The latent variable model abstracts concept dynamics for analysis.
Log-odds matching and gradient descent bias contribute to linear representations.
Experiments confirm linear representations in simulated data and LLaMA-2.
Orthogonality of representations is observed in separated concepts.
Multilingual embedding and Winograd Schema experiments validate theoretical predictions.
Connection to causal representation learning and related literature.
Tilastot
"Linear representations emerge when learning from data matching the latent variable model."
Lainaukset
"Linear representations emerge when learning from data matching the latent variable model."