Pre-training loss predicts language model performance on downstream tasks, revealing emergent abilities.