Core Concepts
Confidence estimation and calibration are crucial for improving the reliability of Large Language Models by addressing errors and biases.
Abstract
This survey explores confidence estimation and calibration in Large Language Models (LLMs). It covers fundamental concepts, challenges, methods, applications, and future directions. The content is structured as follows:
Introduction to Confidence Estimation and Calibration in LLMs.
Preliminaries and Background covering Basic Concepts, Metrics, and Methods.
White-Box Methods for Confidence Estimation including Logit-based methods, Internal state-based methods, Semantics-based methods.
Black-Box Methods for Confidence Estimation including Linguistic confidence methods, Consistency-based estimation, Surrogate models.
Calibration Methods for improving generation quality and linguistic confidence.
Applications such as Hallucination detection, Ambiguity detection, Uncertainty-guided data exploitation.
Future Directions focusing on Comprehensive Benchmarks, Multi-modal LLMs, Calibration to human variation.
Stats
"Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks in various domains."
"Confidence (or uncertainty) estimation is crucial for tasks like out-of-distribution detection and selective prediction."
"The output space of these models is significantly larger than that of discriminative models."
"Model calibration focuses on aligning predictive probabilities to actual accuracy."
Quotes
"Language models are few-shot learners." - Tom B. Brown et al., 2020b
"Uncertainty in natural language generation: From theory to applications." - Joris Baan et al., 2023