The report introduces a four-step framework for applying conformal prediction to quantify retrieval uncertainty in RAG frameworks:
Collecting a calibration set of questions answerable from the knowledge base. Each question's embedding is compared against document embeddings to identify the most relevant document chunks and record their similarity scores.
Analyzing the similarity scores to determine a similarity score cutoff threshold that ensures the true answer is captured in the context with a user-specified confidence level (1-α).
During inference, retrieving all chunks with similarity exceeding the threshold to provide context to the LLM.
Providing a Python package to automate the entire workflow using LLMs, without human intervention.
The framework addresses limitations of RAG, such as the potential for retrieval failure or contradictory content, by quantifying uncertainty in the retrieval process. However, the effectiveness of the approach depends on the calibration data being representative of real-world questions, the embedding model's performance, and the downstream LLM's ability to handle uncertainty in the provided context.
На другой язык
из исходного контента
arxiv.org
Дополнительные вопросы