toplogo
Sign In

Poseidon: An Open Framework for Archaeogenetic Human Genotype Data Management and Sharing


Core Concepts
Poseidon is an open framework that provides a standardized data format, software tools, and public archives to enable FAIR handling of archaeogenetic human genotype data.
Abstract
The study of ancient human genomes has accelerated in the last decade, with thousands of new ancient genomes being released each year. However, there is a lack of infrastructure to handle the rich context data (e.g., spatiotemporal provenience) that accompanies ancient samples, as well as a lack of standardized archives for derived genotype data used in most archaeogenetic studies. To address these issues, the Poseidon framework was developed, which consists of three main components: A data format (the Poseidon package) to store genotype data together with context information in a structured, human- and machine-readable format. Software tools, such as trident for data management, xerxes for data analysis, and qjanno for querying context data, that work with the Poseidon package format. Public archives (the Poseidon Community Archive, Poseidon Minotaur Archive, and Poseidon AADR Archive) that store and maintain Poseidon packages, enabling community-driven data sharing and curation. The Poseidon framework aims to simplify data storage, acquisition, analysis, and publication in the field of human archaeogenetics, ensuring FAIR data handling and promoting computational reproducibility. The modular design allows for flexible adoption, from using the package format locally to contributing to the public archives.
Stats
"Archaeogenetic samples can only be effectively analysed with context data." "Recently, the threshold of genome-wide data for 10,000 ancient human individuals has been surpassed." "Poseidon features public archives with per-article packages that can be downloaded through an open web API."
Quotes
"To make all this new data publicly available, researchers can partly rely on existing infrastructure for the archival and distribution of modern genetic data, such as the Sequence Read Archive (SRA) [11], the European Nucleotide Archive (ENA) [12] or other INSDC databases (https://www.insdc.org). However, this infrastructure has not been prepared to also capture the rich context-data ranging from archaeological field observations to radiocarbon dating that accompanies ancient samples." "Poseidon emphasises human- and machine-readable data storage, the development of convenient and interoperable command line software, and a high degree of source granularity to elevate the original data publication to the main unit of long-term curation."

Deeper Inquiries

How can the Poseidon framework be extended to integrate with larger Linked Open Data systems and ontologies?

To integrate the Poseidon framework with larger Linked Open Data systems and ontologies, several steps can be taken: Utilize Existing Ontologies: Poseidon can incorporate existing ontologies relevant to archaeogenetics, such as the Human Phenotype Ontology or the Sequence Ontology. By mapping Poseidon's data elements to these ontologies, it can enhance interoperability and data integration with external systems. Semantic Web Technologies: Poseidon can adopt Semantic Web technologies like RDF (Resource Description Framework) to represent its data in a linked data format. This would enable Poseidon data to be easily linked and queried alongside other datasets in the Linked Open Data cloud. Linked Data Principles: By adhering to Linked Data principles, Poseidon can ensure that its data is uniquely identified, linked to other related resources, and published in a machine-readable format. This would facilitate data discovery and integration with external datasets. Collaboration with Ontology Developers: Poseidon can collaborate with ontology developers in the field of archaeogenetics to create domain-specific ontologies that align with Poseidon's data model. This collaboration can ensure that Poseidon's data is semantically enriched and easily integrated with external ontologies. APIs for Data Integration: Poseidon can develop APIs that allow external systems to query and retrieve Poseidon data in a standardized format. By providing well-documented APIs, Poseidon can facilitate data exchange and integration with other Linked Open Data systems. Continuous Development and Updates: Poseidon should stay abreast of developments in Linked Open Data standards and ontologies to ensure compatibility and interoperability with evolving data integration practices. Regular updates and enhancements to the framework can help maintain alignment with best practices in the Linked Open Data community.

How can the Poseidon framework be adapted to handle other types of ancient genomic data beyond human archaeogenetics, such as ancient animal or plant genomes?

Adapting the Poseidon framework to handle other types of ancient genomic data, such as ancient animal or plant genomes, involves the following considerations: Data Model Extension: Poseidon's data model can be extended to accommodate the specific characteristics of animal or plant genomic data. This may include additional data fields, metadata requirements, and quality control measures tailored to non-human genomes. Ontology Integration: Incorporating domain-specific ontologies for animal or plant genomics into Poseidon can enhance data standardization and interoperability. By aligning with relevant ontologies, Poseidon can ensure consistency in data representation across different genomic domains. Genomic Data Formats: Supporting a wider range of genomic data formats commonly used in animal and plant genomics, such as VCF (Variant Call Format) for genetic variation data, can broaden Poseidon's applicability to diverse genomic datasets. Species-specific Context Data: Adapting Poseidon to capture species-specific context data relevant to animal or plant genomes, such as ecological information, taxonomic classifications, and sample provenance, can enrich the framework's capability to handle non-human genomic data. Community Engagement: Involving researchers and experts in animal and plant genomics in the development and expansion of Poseidon can ensure that the framework meets the specific needs and requirements of these scientific communities. Collaboration with domain specialists can guide the adaptation of Poseidon to diverse genomic datasets. Tool Development: Developing specialized tools within the Poseidon framework for the analysis and visualization of animal and plant genomic data can enhance its utility for researchers working in these domains. Customized functionalities for processing and interpreting non-human genomic data can make Poseidon more versatile and user-friendly.

What challenges might arise in maintaining a healthy community of developers, contributors, and maintainers for the long-term sustainability of the Poseidon project?

Maintaining a healthy community of developers, contributors, and maintainers for the long-term sustainability of the Poseidon project may face several challenges: Sustained Engagement: Ensuring continued engagement and participation from community members over an extended period can be challenging. Keeping contributors motivated and involved in the project requires ongoing communication, recognition of their contributions, and opportunities for skill development and growth. Resource Constraints: Limited resources, such as funding, infrastructure, and personnel, can hinder the growth and sustainability of the community. Securing long-term support and resources for the project is essential to sustain community activities and development efforts. Community Dynamics: Managing diverse perspectives, conflicting priorities, and potential conflicts within the community can pose challenges. Building a collaborative and inclusive community culture, resolving conflicts constructively, and fostering a sense of belonging among members are crucial for maintaining a healthy community environment. Skill Diversity: Ensuring a diverse skill set among community members is important for the project's success. Balancing technical expertise, domain knowledge, and project management skills within the community can be a challenge and may require targeted recruitment and training efforts. Leadership Succession: Planning for leadership succession and continuity is vital for the long-term sustainability of the project. Identifying and nurturing future leaders, documenting processes and best practices, and establishing governance structures that support smooth transitions are key considerations for maintaining community stability. External Dependencies: External dependencies, such as changes in funding sources, technological advancements, or shifts in research priorities, can impact the project's sustainability. Developing contingency plans, diversifying funding sources, and staying adaptable to external changes can help mitigate risks associated with external dependencies. Community Health Monitoring: Regularly monitoring the health and dynamics of the community, collecting feedback from members, and addressing emerging issues proactively are essential for maintaining a vibrant and sustainable community. Implementing mechanisms for feedback, evaluation, and continuous improvement can help address challenges and foster community resilience.
0