The paper presents a Graph RAG approach that aims to address the limitations of existing retrieval-augmented generation (RAG) and query-focused summarization (QFS) methods when applied to large text corpora.
The key aspects of the approach are:
Text Chunking and Element Extraction: The source documents are split into text chunks, and an LLM is used to extract entities, relationships, and claims from these chunks, generating a graph-based index.
Graph Community Detection: Community detection algorithms are used to partition the graph index into hierarchical communities of closely-related elements.
Community Summarization: LLM-generated summaries are created for each community in the hierarchy, providing comprehensive coverage of the underlying graph index and source documents.
Query-Focused Summarization: When answering a user query, the community summaries are used in a map-reduce approach - first generating partial answers from each relevant community summary, then summarizing these partial answers into a final global answer.
The evaluation compares this Graph RAG approach to a naive RAG baseline and a global text summarization approach, across two datasets in the 1 million token range. The results show that the Graph RAG approach, especially using intermediate- and low-level community summaries, outperforms the baselines in terms of comprehensiveness and diversity of the generated answers, while requiring fewer tokens than the text summarization approach.
إلى لغة أخرى
من محتوى المصدر
arxiv.org
الرؤى الأساسية المستخلصة من
by Darren Edge,... في arxiv.org 04-26-2024
https://arxiv.org/pdf/2404.16130.pdfاستفسارات أعمق