Основні поняття
Large language models can confidently generate responses that are incorrect or nonsensical (hallucinations). This work proposes a principled procedure to determine when the model should abstain from responding, instead of hallucinating, by leveraging the model's self-consistency as a measure of confidence.
Анотація
The authors develop a method to mitigate hallucinations in large language models (LLMs) by determining when the model should abstain from responding. The key ideas are:
-
Use the LLM itself to self-evaluate the similarity between its sampled responses for a given query. This provides a measure of the model's confidence in its responses.
-
Leverage conformal prediction techniques to develop an abstention procedure that benefits from rigorous theoretical guarantees on the hallucination rate (error rate).
The method works as follows:
- Generate multiple responses from the LLM for a given query.
- Compute a score based on the similarity between the responses, either by counting the number of similar responses (match count) or estimating the expected number of similar responses (expected match count).
- Use conformal prediction to determine a threshold on the score, below which the model should abstain from responding.
This approach is evaluated on closed-book, open-domain question answering datasets. It is shown to reliably bound the hallucination rate while maintaining a significantly less conservative abstention rate compared to baselines using log-probability scores to quantify uncertainty.
The authors also provide a method for calibrating the threshold used to determine if two responses match, based on conformal prediction, with theoretical guarantees on the accuracy of the match prediction.
Статистика
The abstention rate is the expected proportion of time the method chooses to abstain from responding.
The hallucination risk is the expected proportion of unfiltered hallucinations in the responses.
Цитати
"Large language models (LLMs) are excellent at next word prediction. At the same time, however, they are also prone to hallucination—that is, confidently generate responses that may look plausible on the surface, but that are actually incorrect or even nonsensical."
"Hallucinations can be extremely detrimental towards achieving trustworthy and reliable LLM performance, and hence avoiding or even detecting hallucinations has become one of the most important research topics in LLM research."