Online adaptation of language models improves predictive performance by turning parameters into temporally changing states, extending context length with memory in weights.