Core Concepts
This research paper investigates the internal mechanisms by which small and large transformer models solve propositional logic problems, revealing distinct reasoning pathways and specialized attention head functions.
Stats
When the linear chain is queried, the layer-3 attention heads in the small transformer predominantly focus on the QUERY position, with over 90% of their attention weights on average (based on 1k test samples).
In contrast, when the LogOp chain is queried in the small transformer, less than 5% of layer-3 attention is on the QUERY on average.
An affine classifier trained on the layer-2 residual stream at the QUERY position for the linear chain achieves above 97% test accuracy in predicting the start of the linear chain (trained and tested on 5k samples).
Mistral-7B achieves above 70% accuracy on a minimal version of the propositional logic problem.
In Mistral-7B, attention head (12,9) places on average above 90% of its attention weight at the "conclusion" variable of the queried rule.
In Mistral-7B, attention head (13,11) assigns above 50% attention weight to the QUERY position, and its attention weight at QUERY is about 10 times larger than the second largest one on average.
In Mistral-7B, attention heads (16,12) and (16,14) place greater than 56%, and 70% of their attention respectively in the fact section of the context.
In Mistral-7B, when the model correctly solves the problem, attention head (19,8)'s top-2 attention weights are always on the correct starting node of the queried rule and the correct variable in the fact section, and the two token positions occupy more than 60% of its total attention in the relevant context on average.