Core Concepts
Large language models frequently exhibit factual hallucinations, even when they possess the relevant knowledge. This study investigates the internal dynamics of the model during successful and failed knowledge recall to understand the mechanisms behind such hallucinations.
Abstract
The study focuses on investigating the phenomenon of large language models (LLMs) hallucinating factual information, even when they possess the relevant knowledge. The researchers analyze the inference dynamics of LLMs to understand the underlying reasons for this behavior.
Key highlights:
Known fact hallucination arises from failed knowledge recall. When the model generates incorrect outputs, the correct answer only appears in the top rank with a 30% frequency across the layers, significantly lower than the 78% frequency when the output is correct.
The Multi-Layer Perception (MLP) modules have a more significant impact on incorrect outputs than attention modules. MLP not only diminishes the probability of the correct answer when producing incorrect outputs but also contributes to generating erroneous outputs in the final decoding layer.
Observation of patterns in output token inference dynamics. In the residual stream generating correct outputs, the information of the output token shows a steep increase in the middle to later layers, while erroneous outputs tend to speculate from shallower layers.
The dynamic patterns of output tokens can be used for accurate hallucination detection in predictions. By leveraging the dynamic curve of output tokens across layers, classifiers can be trained to distinguish whether the model is recalling or hallucinating, achieving an 88% successful detection rate.
The study provides insights into the internal mechanisms of LLMs during successful and failed knowledge recall, shedding light on the reasons for factual hallucinations and proposing a method to detect such hallucinations.
Stats
The capital of Canada is the city of Ottawa.
Toronto is not the capital of Canada.
Quotes
"Known fact hallucination arises from failed knowledge recall."
"MLP modules have a more significant impact on incorrect outputs than attention modules."
"The dynamic patterns of output tokens can be used for accurate hallucination detection in predictions."