Core Concepts
Hallucinations in large language models are a prevalent issue that requires a cohesive framework and precise definitions within the NLP research community.
Abstract
The content provides a comprehensive analysis of the conceptualization and measurement of hallucinations in natural language processing (NLP) research. It examines how hallucination is defined and characterized across various NLP subfields, including conversational AI, abstractive summarization, data-to-text generation, machine translation, image and video captioning, and data augmentation.
The analysis reveals a lack of consensus in the field, with 31 unique frameworks identified for defining hallucination. The definitions vary in their emphasis on attributes such as fluency, plausibility, confidence, intrinsic and extrinsic hallucinations, non-factuality, unfaithfulness, and nonsensicality.
The content also highlights the need to consider the sociotechnical nature of hallucination, as the term has diverse interpretations across different disciplines, including psychology, neurology, and philosophy. The authors argue that the prevailing negative connotation of hallucination in NLP may lead to misconceptions and stigma.
The paper further examines the existing metrics used to quantify hallucination, categorizing them into four main approaches: human evaluation, data-driven metrics, statistical metrics, and mixed methodologies. The analysis underscores the lack of standardization in measurement, contributing to the diversity of approaches within the field.
The practitioner survey provides insights into researchers' perceptions of hallucinations, their frequency of encountering them, and the potential societal ramifications, such as the impact on education, scholarly work, code generation, and the dissemination of misinformation. The survey also reveals that some researchers view hallucinations as a manifestation of creativity, highlighting the need for a more nuanced understanding of the phenomenon.
Based on the findings, the content outlines key challenges and provides recommendations to address the issues, including the need for explicit documentation of hallucination frameworks, consideration of user profiles and use cases, development of standardized definitions, and promoting transparency in model decision-making processes.
Stats
"Hallucinations are just what is needed for models to be creative. In truth, unless AI text-generators are factually grounded with external knowledge for a specific field, they are just story generators which aim to be creative, hence"hallucinate.""
"It leads to problems if even I do not have any idea about the work. It is hard to differentiate if it is a genuine output or hallucination."
"I was asking an AI to generate me a piece of code. It ended up picking some code from one website and some from another and combining it. However those two websites (they were linked by chatgpt) we're using different versions of the library so the resulting code couldn't be executed."
Quotes
"Hallucination refers to the phenomenon where the model generates false information not supported by the input."
"Large Language Models often exhibit a tendency to produce exceedingly confident, yet erroneous, assertions commonly referred to as hallucinations."
"Models generate plausible-sounding but unfaithful or nonsensical information called hallucinations"