แนวคิดหลัก
Large language models often generate false or unsubstantiated outputs, known as "hallucinations", which prevent their adoption in critical domains. This work proposes a general method to detect a subset of hallucinations, called "confabulations", by estimating the semantic entropy of model outputs.
บทคัดย่อ
The content discusses the problem of hallucinations in large language models (LLMs) such as ChatGPT and Gemini. LLMs can sometimes generate false or unsubstantiated answers, which poses a significant challenge for their adoption in diverse fields, including legal, news, and medical domains.
The authors propose a new method to detect a subset of hallucinations, called "confabulations", which are arbitrary and incorrect generations. The key idea is to compute the uncertainty of the model's outputs at the level of meaning rather than specific sequences of words, using an entropy-based approach.
The proposed method has several advantages:
- It works across datasets and tasks without requiring a priori knowledge of the task or task-specific data.
- It can robustly generalize to new tasks not seen before.
- By detecting when a prompt is likely to produce a confabulation, the method helps users understand when they must take extra care with LLMs, enabling new possibilities for using these models despite their unreliability.
The authors highlight that encouraging truthfulness through supervision or reinforcement has only been partially successful, and a general method for detecting hallucinations in LLMs is needed, even for questions to which humans might not know the answer.
สถิติ
Large language model (LLM) systems, such as ChatGPT1 or Gemini2, can show impressive reasoning and question-answering capabilities but often 'hallucinate' false outputs and unsubstantiated answers3,4.
Answering unreliably or without the necessary information prevents adoption in diverse fields, with problems including fabrication of legal precedents5 or untrue facts in news articles6 and even posing a risk to human life in medical domains such as radiology7.
คำพูด
"Encouraging truthfulness through supervision or reinforcement has been only partially successful8."
"Our method addresses the fact that one idea can be expressed in many ways by computing uncertainty at the level of meaning rather than specific sequences of words."