Core Concepts
Epistemic uncertainty plays a fundamental role in decision-making tasks with Large Language Models, and actively incorporating it through Thompson Sampling policies can significantly improve performance compared to greedy approaches.
Abstract
The paper investigates the importance of epistemic uncertainty estimation in decision-making tasks that use natural language as input, with a focus on the contextual bandit problem.
Key highlights:
- Large Language Models (LLMs) have become the norm for such tasks, but existing approaches do not explicitly estimate the epistemic uncertainty of the agent.
- The authors compare a greedy LLM bandit policy, which selects the action with the highest predicted reward, to LLM bandits that actively use uncertainty estimation through Thompson Sampling.
- They adapt several epistemic uncertainty estimation techniques to LLMs, including Dropout, Laplace Approximation, and Epinets.
- Experiments on a real-world toxic content detection task show that the Thompson Sampling policies significantly outperform the greedy baseline, highlighting the benefits of modeling uncertainty for exploration in bandit problems with text and LLMs.
- The findings suggest that uncertainty should play a more central role in developing LLM-based agents for decision-making tasks.
Stats
The dataset consists of around 136,000 comments, each associated with a "hate speech score". Comments with a score > 0.5 are considered "toxic" (around 36% of the comments are labeled as toxic).
Quotes
"Epistemic uncertainty stems from our lack of knowledge about the best model to describe a process. It is reducible as more data or knowledge is gathered."
"Aleatoric uncertainty, in contrast, is due to the inherent randomness in the data or environment and is irreducible even with more data."
"These findings suggest that, while often overlooked, epistemic uncertainty plays a fundamental role in bandit tasks with LLMs."