核心概念
This paper proposes the use of "multicalibration" to yield interpretable and reliable confidence scores for outputs generated by large language models (LLMs), which can help detect hallucinations.
摘要
The paper introduces multicalibration techniques to produce calibrated probabilities indicating whether a generated response constitutes a hallucination in LLMs. Unlike conventional calibration methods, multicalibrated probabilities are self-consistent not just marginally, but also conditionally on various properties of the instance, allowing them to serve as more refined risk measures.
The key contributions are:
- Applying multicalibration techniques to the context of hallucination detection in LLMs, addressing the challenge of obtaining reasonable "groups" to multicalibrate with respect to via prompt clustering and self-annotation.
- Introducing novel variations of multicalibration methods, including Linear Scaling and Early Stopping, which yield substantial performance enhancements.
- Systematically evaluating these techniques across diverse LLMs and question answering datasets, demonstrating their efficacy in calibration and overall performance compared to existing baselines.
The paper also discusses the importance of calibration in enhancing the trustworthiness and ethical deployment of LLMs, and provides an extensible framework for further improvements via new grouping strategies.
统计
The paper does not contain any explicit numerical data or statistics. The focus is on the methodological contributions.
引用
"Multicalibration asks for calibration not just marginally, but simultaneously across various intersecting groupings of the data."
"Producing 'risk scores' for hallucinations can provide an interpretable measure of risk which can be exposed to the user (e.g. through a coloring scheme, as in Figure 1) to communicate the risk associated with the generated content."