แนวคิดหลัก
The FAIR Jupyter knowledge graph enables granular exploration and analysis of a dataset on the computational reproducibility of Jupyter notebooks associated with biomedical publications.
บทคัดย่อ
The FAIR Jupyter project aims to enhance the accessibility and reusability of a dataset on the computational reproducibility of Jupyter notebooks associated with biomedical publications. The original dataset, which was previously shared as a SQLite database, has been converted into a knowledge graph using semantic web technologies.
The knowledge graph represents various entities from the dataset, including publications, GitHub repositories, Jupyter notebooks, and details about their reproducibility. By modeling the data using ontologies like PROV-O, REPRODUCE-ME, and FaBiO, the knowledge graph enables fine-grained querying and exploration of the dataset.
The authors demonstrate the utility of the knowledge graph by providing a collection of example queries that address a range of use cases, from identifying successfully reproduced notebooks to analyzing the programming languages and error patterns in the notebooks. The knowledge graph is made accessible through a web service, allowing users to explore the data without the need to install any software.
The authors discuss how this semantic approach to data sharing can enhance the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles and help identify and communicate best practices in areas such as data quality, standardization, automation, and reproducibility.
สถิติ
The FAIR Jupyter knowledge graph consists of approximately 190 million triples, taking up a total of about 20.6 GB in space.
The construction of the knowledge graph took a total of 1251.7 seconds.
คำพูด
"Enabling students and instructors to do this – or indeed anyone else, from reproducibility researchers to journal editors or package maintainers – is what we are aiming at."
"Such queries may provide details about any of the variables from the original dataset, highlight relationships between them or combine some of the graph's content with materials from corresponding external resources."