toplogo
Sign In

Mapping Brazilian Historical Entities to Wikidata: Challenges and Opportunities


Core Concepts
The authors aim to construct a knowledge graph for Brazilian history by mapping entities from the Brazilian Dictionary of Historical Biographies (DHBB) to Wikidata, the largest structured database of entities associated with Wikipedia. They find that many DHBB entities are not present in Wikidata, highlighting the need to complete Wikidata with information from this valuable historical resource.
Abstract
The paper discusses the first steps in a project to construct a knowledge graph for Brazilian history based on the Brazilian Dictionary of Historical Biographies (DHBB) and Wikipedia/Wikidata. The DHBB is an encyclopedic resource that provides organized and systematic information about personalities and themes in recent Brazilian history. The authors find that many of the terms and entities described in the DHBB do not have corresponding concepts (or Q items) in Wikidata, the largest structured database of entities associated with Wikipedia. They describe previous work on extracting information from the DHBB and outline the steps to construct a Wikidata-based historical knowledge graph. The paper highlights the challenges in mapping DHBB entries to Wikidata, including disambiguation issues, entities that are part of larger concepts, and entities that no longer exist. The authors also discuss the limitations of Wikidata in representing entities specific to Brazilian history and culture. The authors propose a crowd-sourcing project to improve the mapping between DHBB titles and Wikidata concepts, as well as to complete Wikidata with the named entities from the DHBB that are currently missing. This effort aims to make the wealth of information in the DHBB more widely available and facilitate the construction of a comprehensive knowledge graph for Brazilian history.
Stats
The DHBB has 7,863 entries, including over 6,800 biographies and around 1,000 thematic entries. The authors were able to map 498 out of 973 thematic entries to Wikidata, and 4,300 out of 6,980 biographical entries.
Quotes
"We contend that large repositories of Brazilian-named entities (people, places, organizations, and political events and movements) would be beneficial for extracting information from Portuguese texts." "Given that 498 thematic entries have some Q items in Wikidata, are these correct? There are some very wrong ones, e.g., political parties from other countries." "Overall, Wikidata does not have the necessary information about entities of Brazilian recent history, at least as far as the DHBB's thematic entries are concerned."

Key Insights Distilled From

by Valeria de P... at arxiv.org 04-01-2024

https://arxiv.org/pdf/2403.19856.pdf
Towards a Brazilian History Knowledge Graph

Deeper Inquiries

How can the mapping between DHBB entries and Wikidata concepts be improved through crowdsourcing and collaboration with domain experts?

Crowdsourcing and collaboration with domain experts can significantly enhance the mapping between DHBB entries and Wikidata concepts. By engaging historians, researchers, and subject matter experts, a more accurate and comprehensive mapping can be achieved. Domain experts can provide valuable insights into the historical context, nuances, and specificities of Brazilian history, which can help in disambiguating entities and ensuring correct mappings. Crowdsourcing can involve opening up the mapping task to a broader community, allowing volunteers to contribute their knowledge and expertise. This approach can help in identifying and resolving mapping errors, filling in missing entries, and verifying the accuracy of the mappings. Providing a platform or tool where users can suggest corrections, validate mappings, and add new entities can lead to a more robust and reliable knowledge graph for Brazilian history. Collaborating with domain experts and crowdsourcing efforts can also help in addressing challenges such as ambiguous entity names, outdated information, and missing entries. By leveraging the collective knowledge and expertise of a diverse group of contributors, the mapping process can be refined, validated, and expanded to create a more comprehensive and accurate representation of Brazilian historical entities in Wikidata.

What are the potential challenges and limitations in representing entities specific to Brazilian history and culture in a multilingual knowledge graph like Wikidata?

Representing entities specific to Brazilian history and culture in a multilingual knowledge graph like Wikidata poses several challenges and limitations. One major challenge is the diversity and complexity of Brazilian names, which may have variations, aliases, or multiple components (e.g., surnames, titles) that can lead to ambiguity and difficulty in mapping to Wikidata concepts. This challenge is compounded by the multilingual nature of Wikidata, which may not always have comprehensive coverage of Brazilian-specific entities. Another challenge is the notability criteria of Wikidata, which requires entities to have sufficient coverage and references in reliable sources to warrant inclusion. This criterion may limit the representation of lesser-known or niche entities from Brazilian history and culture, especially if they do not have extensive coverage in mainstream sources or publications. Additionally, language barriers and translation issues can pose challenges in accurately representing Brazilian entities in a multilingual knowledge graph. Translating entity names, descriptions, and attributes from Portuguese to other languages in Wikidata may introduce errors, inconsistencies, or loss of context, impacting the overall quality and accuracy of the representation. Furthermore, maintaining the relevance and currency of Brazilian entities in a dynamic knowledge graph like Wikidata requires continuous updates, monitoring, and collaboration with local experts to ensure that the information remains accurate, up-to-date, and reflective of the evolving landscape of Brazilian history and culture.

How can the knowledge graph constructed from the DHBB be leveraged to support digital humanities research on Brazilian history and culture, beyond the extraction of named entities?

The knowledge graph constructed from the DHBB can serve as a valuable resource for digital humanities research on Brazilian history and culture beyond the extraction of named entities. By structuring and linking the information from the DHBB to Wikidata, researchers can explore complex relationships, patterns, and trends within Brazilian history and culture in a more systematic and interconnected manner. One way to leverage the knowledge graph is through network analysis, where researchers can analyze the connections between historical figures, events, organizations, and movements to uncover hidden relationships, influence networks, and historical trajectories. This can provide insights into the social, political, and cultural dynamics of Brazilian history and help in identifying key actors and pivotal moments. Furthermore, the knowledge graph can support text mining and natural language processing tasks by providing structured data that can be used for sentiment analysis, topic modeling, and trend detection in Brazilian historical texts. Researchers can apply machine learning algorithms to analyze patterns in the data, extract meaningful insights, and generate new knowledge about Brazilian history and culture. Moreover, the knowledge graph can facilitate interdisciplinary research by enabling scholars from different fields to collaborate, share data, and explore diverse perspectives on Brazilian history and culture. By integrating data from multiple sources and domains, the knowledge graph can support comparative studies, cross-referencing, and contextual analysis that enrich our understanding of Brazilian heritage and identity. Overall, the knowledge graph constructed from the DHBB can be a powerful tool for advancing digital humanities research, fostering innovation, and promoting a deeper appreciation of Brazilian history and culture through data-driven exploration and analysis.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star