A Mathematical Theory for Learning Semantic Capabilities in Abstract Language Models
A mathematical theory is developed to explain the emergence of learned skills in large language models when the number of system parameters and the size of training data surpass certain thresholds.