Concepts de base
Modeling uncertainty and using post-fusion as a fallback strategy can significantly improve the performance of retrieval-augmented generation with large language models.
Résumé
The paper explores different strategies for integrating retrieved passages with large language models (LLMs) to enhance open-domain question answering. The authors first examine the limitations of a commonly used concatenation approach, which often results in "unknown" outputs even when the correct document is among the top-k retrieved passages.
To address this issue, the authors explore four alternative strategies:
-
Two single-round methods that utilize chain-of-thought reasoning:
- Pruning Prompt: Guides the LLM to effectively identify answerable passages through selective elimination.
- Summary Prompt: Extracts the central information from the top-k passages and uses the synthesized summary to generate the final answer.
-
Two multi-round approaches that incorporate feedback loops:
- Post-Fusion as the Fallback (Concat+PF): Initially uses the concatenation method, and if the LLM generates an "unknown" response, it proceeds to the Post-Fusion approach in the second round.
- Concatenation as the Distiller (PF+Concat): First leverages Post-Fusion to produce a pool of potential answers, then concatenates the unfiltered passage with the question and answer candidates to derive the final answer.
Through comprehensive experiments on three open-domain question-answering datasets (NQ, TriviaQA, and SQuAD), the authors demonstrate that their multi-round approaches outperform the traditional concatenation approach, achieving over a 10% improvement in answer exact match (EM).
The paper also provides insights into the optimal placement of the gold passage within the top-k retrieved passages, the effect of different decoding strategies, and the token usage analysis of the proposed methods.
Stats
The percentage of "unknown" responses using the Concatenation strategy is around 20%.
The error rate of the majority vote approach increases from 12% to 22% as the number of passages increases from 5 to 50.
Citations
"Surprisingly, this approach often results in generating "unknown" outputs, even when the correct document is among the top-k retrieved passages."
"Particularly, when more passages are incorporated from 5 to 50, the error rate of the majority vote in the answer pool increases from 12% to 22%."