Uncovering the Mechanisms Behind Factual Recall in Transformer-Based Language Models
Transformer-based language models employ a sequential process to achieve factual recall, involving argument extraction by task-specific attention heads, activation of the extracted argument by the MLP layer, and task-aware function application.