核心概念
This research proposes a novel approach to constructing a knowledge graph specifically for liver cancer from Chinese Electronic Medical Records (EMRs), leveraging deep learning techniques for entity recognition and knowledge fusion to improve the accuracy and completeness of medical knowledge representation.
摘要
Bibliographic Information:
Zhang, Y., Wang, H., Gao, Y., Hu, X., Fan, Y., & Fang, Z. (2024). Liver Cancer Knowledge Graph Construction based on dynamic entity replacement and masking strategies RoBERTa-BiLSTM-CRF model. arXiv preprint arXiv:2410.18090v1.
Research Objective:
This paper presents a novel method for constructing a knowledge graph specifically focused on liver cancer, aiming to address the challenges of inconsistent terminology and dispersed knowledge distribution in Chinese EMRs.
Methodology:
The researchers developed a systematic workflow involving:
- Conceptual Layer Design: Defining entity types and relationships relevant to liver cancer based on expert knowledge and medical resources.
- Data Preprocessing: Converting EMRs into a usable format and annotating key entities.
- Entity Recognition: Implementing a DERM-RoBERTa-BiLSTM-CRF model, incorporating dynamic entity replacement and masking strategies, to extract entities from EMRs.
- Knowledge Fusion: Aligning and merging extracted entities with a medical knowledge base (www.XYWY.com) using TF-IDF to enhance knowledge completeness.
- Knowledge Graph Construction: Populating a Neo4j graph database with extracted entities and relationships.
Key Findings:
- The proposed DERM-RoBERTa-BiLSTM-CRF model achieved a 4.3% improvement in F1 score for entity recognition compared to the baseline BERT-BiLSTM-CRF model.
- Knowledge fusion using TF-IDF effectively addressed inconsistencies in medical terminology between EMRs and the online knowledge base.
- The constructed liver cancer knowledge graph demonstrated practical utility in retrieving potential complications associated with liver cancer.
Main Conclusions:
This study successfully constructed a liver cancer-specific knowledge graph from Chinese EMRs, demonstrating the effectiveness of the proposed methodology in improving the accuracy and completeness of medical knowledge representation. The utilization of deep learning for entity recognition and knowledge fusion techniques significantly contributed to the quality of the constructed knowledge graph.
Significance:
This research provides a valuable framework for constructing specialized knowledge graphs for other diseases, potentially aiding in clinical decision support, patient education, and medical research.
Limitations and Future Research:
- The study was limited to Chinese EMRs and a specific medical knowledge base.
- Future research could explore the generalizability of the proposed methodology to other languages and medical domains.
- Further investigation into downstream applications of the liver cancer knowledge graph, such as drug repurposing and personalized treatment recommendations, is warranted.
统计
Liver cancer constitutes approximately 10.4% of all cancer cases worldwide.
Liver cancer accounts for 6.3% of all cancer-related fatalities.
The constructed liver cancer KG contains 12 types of entities, totaling 46,365 entities and 296,655 triples.
The DERM-RoBERTa-BiLSTM-CRF model achieved a 4.3% improvement in F1 score for entity recognition compared to the baseline BERT-BiLSTM-CRF model.
The Operation entity achieved the highest F1 score of 100%, while the Symptoms entity recorded the lowest with an F1 score of 86.06%.