This research paper delves into the inner workings of language models (LMs) when tasked with fact completion. It challenges the prevailing assumption that correct predictions primarily stem from factual recall. Instead, the authors argue that LMs employ a combination of mechanisms, including heuristics and guesswork, to arrive at their answers.
The paper introduces four distinct prediction scenarios: generic language modeling, guesswork, heuristics recall, and exact fact recall. Each scenario represents a different level of model reliability and information processing. To enable precise analysis, the authors propose a novel method called PRISM for creating model-specific datasets. These datasets contain examples tailored to each prediction scenario, allowing for a more granular study of LM behavior.
The researchers apply causal tracing (CT), a popular interpretability method, to analyze the different prediction scenarios. Their findings reveal that while CT produces distinct results for each scenario, aggregated results from mixed examples tend to be dominated by the exact fact recall scenario. This highlights the importance of disentangling and interpreting LM behavior based on specific prediction scenarios.
The paper concludes that relying solely on accuracy as a metric for evaluating LM fact completion can be misleading. It advocates for a more nuanced understanding of LM behavior by considering the various mechanisms at play. The authors' proposed PRISM datasets and their analysis using CT provide valuable tools for achieving this goal.
На другой язык
из исходного контента
arxiv.org
Дополнительные вопросы