Norouzi, S. S., Barua, A., Christou, A., Gautam, N., Eells, A., Hitzler, P., & Shimizu, C. (2024). Ontology Population using LLMs. arXiv preprint arXiv:2411.01612v1.
This paper investigates the effectiveness of large language models (LLMs) for automatically populating knowledge graphs (KGs), focusing on the Enslaved.org Hub Ontology as a case study. The researchers aim to determine whether LLMs can accurately extract information from unstructured text and convert it into structured data suitable for KG population.
The researchers employed a three-stage methodology: (1) Data Pre-processing: collecting, cleaning, and preparing relevant text data from Wikipedia and aligning it with the Enslaved.org and Wikidata ontologies. (2) Text Retrieval: utilizing text summarization and Retrieval-Augmented Generation (RAG) techniques to identify and extract relevant information from large text files. (3) KG Population: prompting LLMs (GPT-4 and Llama-3) with the extracted text and ontology modules to generate structured data in the form of triples. The accuracy of the generated triples was evaluated against a ground truth dataset using string similarity metrics.
The research concludes that LLMs offer a viable approach to automating the population of KGs, showing potential for significant time and cost savings in knowledge engineering tasks. The study highlights the importance of using a well-defined ontology as guidance for LLMs to ensure accurate and consistent data extraction.
This research contributes to the growing field of automated knowledge engineering by demonstrating the potential of LLMs for ontology population. The findings have implications for various domains reliant on KGs, including digital humanities, information retrieval, and semantic web applications.
The evaluation primarily focused on coverage and did not delve into the nuanced assessment of the semantic accuracy of extracted triples. Future research should explore more sophisticated evaluation metrics and investigate the impact of different ontology structures and complexities on LLM performance. Additionally, comparing the efficiency and accuracy of LLM-based ontology population against human experts would provide valuable insights.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Sanaz Saki N... at arxiv.org 11-05-2024
https://arxiv.org/pdf/2411.01612.pdfDeeper Inquiries