Emergent Abilities in Smaller Generative Language Models Trained on Simplified Data
Downscaling the language complexity during pre-training enables smaller generative language models to exhibit emergent zero-shot learning capabilities comparable to larger models trained on unrestricted language.