Core Concepts
TRAQ provides the first end-to-end statistical correctness guarantee for retrieval augmented question answering by leveraging conformal prediction.
Abstract
The paper proposes a novel framework called Trustworthy Retrieval Augmented Question Answering (TRAQ) that combines retrieval augmented generation (RAG) with conformal prediction to provide theoretical guarantees on question answering performance.
Key highlights:
- TRAQ uses conformal prediction to construct prediction sets for the retriever and generator models separately, and then aggregates these sets to obtain an overall correctness guarantee.
- TRAQ introduces a novel nonconformity measure that estimates the uncertainty at the semantic level, enabling it to work with black-box APIs where individual token probabilities are not available.
- TRAQ leverages Bayesian optimization to minimize the average size of the generated prediction sets, improving the efficiency of the approach.
- Extensive experiments demonstrate that TRAQ empirically satisfies the desired coverage guarantee while reducing the average prediction set size compared to an ablation by 16.2% on average.
Stats
Large language models (LLMs) frequently generate incorrect responses based on made-up facts, called hallucinations, in open-domain question answering tasks.
Retrieval augmented generation (RAG) is a promising strategy to avoid hallucinations, but it does not provide guarantees on its correctness.
Quotes
"TRAQ uses conformal prediction, a statistical technique for constructing prediction sets that are guaranteed to contain the semantically correct response with high probability."
"TRAQ leverages Bayesian optimization to minimize the size of the constructed sets."