The paper presents evidence that large language models (LLMs) sometimes exploit a simple vector arithmetic mechanism to solve relational tasks during in-context learning. The key findings are:
The authors observe a distinct processing signature in the forward pass of LLMs, where the model first surfaces the argument to a function (e.g., the country name), then applies the function (e.g., retrieving the capital city) in a later layer. This signature is consistent across models and tasks.
By analyzing the feed-forward network (FFN) updates in GPT2-Medium, the authors show that the vector arithmetic mechanism is often implemented in mid-to-late layer FFNs. These FFN outputs can be isolated and applied to new contexts, demonstrating their modularity and ability to implement content-independent functions.
The authors find that this vector arithmetic mechanism is specific to tasks that require retrieving information from the model's pretraining memory, rather than from the local context. When the answer can be directly copied from the prompt, the FFN updates do not play a significant role.
The results contribute to the understanding of how LLMs solve tasks, suggesting that despite their complexity, they can sometimes rely on familiar and intuitive algorithms like vector arithmetic. This offers insights into the interpretability of LLMs and potential methods for detecting and preventing unwanted behaviors.
Na inny język
z treści źródłowej
arxiv.org
Głębsze pytania