Transformers Generalize Hierarchically Without Explicit Structural Bias: Understanding the Role of Training Objectives
Transformer language models trained with the language modeling objective consistently learn to generalize hierarchically, even without any explicit structural bias, unlike models trained with other objectives like sequence-to-sequence or prefix language modeling.