The authors systematically analyze the tug-of-war between a large language model's (LLM) internal knowledge and the retrieved information in a retrieval-augmented generation (RAG) setting. They find that:
These findings hold across 6 different datasets spanning over 1,200 questions, using GPT-4, GPT-3.5, and Mistral-7B models. The authors also show that the choice of prompting technique can influence the strength of this relationship.
The results highlight an underlying tension in LLMs between their pre-trained knowledge and the information presented in retrieved context. This has important implications for the reliability of RAG systems, especially as they are increasingly deployed in high-stakes domains like healthcare and finance. The authors caution that users and developers should be aware of these unintended effects when relying on RAG-enabled LLMs.
To Another Language
from source content
arxiv.org
Viktige innsikter hentet fra
by Kevin Wu,Eri... klokken arxiv.org 04-17-2024
https://arxiv.org/pdf/2404.10198.pdfDypere Spørsmål