Core Concepts
This paper proposes an active learning framework to effectively and efficiently mitigate hallucinations in large language models (LLMs) for text summarization by selecting diverse hallucination samples for annotation and finetuning.
Abstract
The paper addresses the problem of hallucinations in large language models (LLMs) for text summarization. Hallucinations refer to the generation of seemingly plausible but factually incorrect or unsupported outputs by LLMs.
The authors first revisit the typology of hallucinations in text summarization, identifying three main types: semantic frame errors, discourse errors, and content verifiability errors. They then propose an active learning framework to mitigate these hallucinations in LLMs, reducing the need for costly human annotations.
The key components of the framework are:
- Capturing diverse hallucination types: The authors leverage existing detection models to measure semantic frame, discourse, and content verifiability errors in the LLM-generated summaries.
- Hallucination diversity-aware sample selection: The authors propose a sample selection strategy called HADAS that not only selects samples with low hallucination scores but also ensures diversity in the types of hallucinations exhibited.
- Iterative finetuning of LLMs: The selected and annotated samples are used to finetune the LLMs, with the goal of comprehensively mitigating hallucinations.
Extensive experiments on three datasets and different backbone LLMs demonstrate the effectiveness of the proposed HADAS method in alleviating hallucinations while maintaining high summarization quality, outperforming both random sampling and existing diversity-based approaches.
Stats
About 300,000 people are still trapped by the worst flooding in the region for 50 years.
The state of Tabasco in southern Mexico has experienced heavy rains and flooding that have forced hundreds of thousands of people from their homes over the past four days.
Quotes
"Large Language Models (LLMs) have shown propensity to generate hallucinated outputs, i.e., texts that are factually incorrect or unsupported."
"Existing methods for hallucination mitigation often focus on finetuning LLMs with human feedback or human-annotated samples to align the models' outputs with human-plausible content."