Core Concepts
The HyKGE framework leverages the reasoning capabilities of large language models to compensate for the incompleteness of user queries, optimize the interaction process, and provide diverse retrieved knowledge from knowledge graphs to improve the accuracy and reliability of medical language model responses.
Abstract
The paper investigates the use of retrieval-augmented generation (RAG) based on knowledge graphs (KGs) to improve the accuracy and reliability of large language models (LLMs) in the medical domain. It identifies three key challenges:
- Insufficient and repetitive knowledge retrieval due to the misalignment between user queries and structured KG knowledge.
- Tedious and time-consuming query parsing and multiple interactions with LLMs to align user intent with KG knowledge.
- Monotonous knowledge utilization due to the difficulty in balancing the diversity and relevance of retrieved knowledge.
To address these challenges, the authors propose the Hypothesis Knowledge Graph Enhanced (HyKGE) framework:
- In the pre-retrieval phase, HyKGE leverages the zero-shot capability and rich knowledge of LLMs to generate hypothesis outputs that provide exploration directions for KG retrieval. It also uses carefully curated prompts to enhance the density and efficiency of LLM responses.
- In the post-retrieval phase, HyKGE introduces the HO Fragment Granularity-aware Rerank Module to filter out noise while ensuring the balance between diversity and relevance in retrieved knowledge.
Experiments on Chinese medical datasets demonstrate the superiority of HyKGE in terms of accuracy and explainability compared to state-of-the-art RAG methods.
Stats
Retrieval-augmented generation can reduce factual errors and improve the reliability of large language models in knowledge-intensive tasks.
Knowledge graphs provide structured knowledge that can facilitate advanced inference capabilities and enable extrapolation for efficient knowledge retrieval.
User queries often exhibit unclear expressions and lack of semantic information, leading to the retrieval of insufficient and repetitive knowledge.
Excessive interactions with large language models can be time-consuming and lead to cumulative errors in the distributed reasoning process.
Balancing the diversity and relevance of retrieved knowledge is a challenge due to the misalignment between monotonous user queries and dense structured knowledge.
Quotes
"Recent approaches suffer from insufficient and repetitive knowledge retrieval, tedious and time-consuming query parsing, and monotonous knowledge utilization."
"To cope with these challenges, we put forward the Hypothesis Knowledge Graph Enhanced (HyKGE) framework, a novel method based on the hypothesis output module (HOM) to explore, locate, and prune search directions for accurate and reliable LLMs responses in pre-retrieval phase and greatly preserve the relevance and diversity of search results at in post-retrieval phase."