核心概念
The Perseus Digital Library introduces its sixth generation, featuring the ATLAS workflow, a system designed to integrate and present a wide range of open and born-digital philological data, moving beyond traditional print-based limitations.
摘要
This article details the development and features of the sixth generation of the Perseus Digital Library, focusing on its new ATLAS (Aligned Text and Linguistic Annotation Server) architecture. This version marks a significant shift from previous iterations, moving beyond digitized print materials to incorporate a vast array of born-digital, open-licensed philological data.
Background and Motivation
The article begins by outlining the history and guiding principles of the Perseus Project, emphasizing its commitment to data integration and sustainability since its inception in 1985. It highlights the project's early focus on integrating textual and visual information, as well as its use of automatic analysis for linking different data classes. The authors emphasize the importance of TEI XML for data longevity and the project's commitment to open licenses for broader scholarly engagement.
Evolution of Perseus and the Need for ATLAS
The article then traces the evolution of the Perseus Digital Library through its five previous versions, each building upon the last in terms of features and content. It highlights the limitations of earlier versions in handling the increasing volume and complexity of born-digital annotations, such as treebanks, translation alignments, and metrical analyses. This need led to the development of the ATLAS architecture.
ATLAS Architecture and Data Model
The article provides a detailed overview of the ATLAS architecture and its use of the Canonical Text Services (CTS) data model for integrating data from various sources. It explains how ATLAS simplifies data ingestion by using a flat TSV format alongside CTS-compliant TEI XML. The authors then delve into specific examples of annotation classes managed within ATLAS, including:
- Scaife Texts: Integration of existing texts from the Scaife Viewer.
- Morpho-syntactic Analysis: Layered approach to linguistic annotation, incorporating curated, hybrid, and automatically generated treebanks.
- Dictionaries: Conversion of Perseus dictionaries into a structured JSON format.
- Textual Notes and Alignments: Representation of textual variants and alignments between source texts and translations.
- Syntax Trees: JSON representation of treebanks, with plans to adopt the Universal Dependency Framework tagset.
- Audio Annotations: Alignment of text chunks with recorded performances.
- Attributions/Credits: A crucial aspect of ATLAS is its ability to preserve and aggregate fine-grained credits for all annotations, ensuring proper attribution for scholarly contributions.
Future Directions
The article concludes by outlining the next steps for the project, including:
- Expanding the services offered by the ATLAS server.
- Refining and augmenting the ATLAS data available on Github.
- Integrating the ATLAS backend and user interface components developed for the "Beyond Translation" project into the existing Scaife architecture.
Overall, the article presents a compelling case for the importance of open philology and the role of sophisticated digital libraries like Perseus in facilitating deeper engagement with complex textual data. The development of the ATLAS workflow signifies a major step forward in this domain, offering a robust and scalable framework for integrating, analyzing, and presenting a wealth of philological information.
統計資料
The Perseus Digital Library currently includes 2,669 works in 3,776 editions and translations (1,941 in Greek and 631 in Latin), with 83.8 million words in all languages (40.6 million in Greek, 16.4 million in Latin).
More than one million words of Greek and of Latin each are available in manually treebanked form.
Machine actionable metrical analyses are available for more than 250,000 lines of Greek and Latin poetry.
The accuracy of automatically generated alignments between Greek and Latin source texts and English translations is approximately 80%.
引述
"Sustainable integration of different categories of data has been a driving force behind the development of Perseus from the beginning."
"Our goal was to create a workflow to organize, rather than create, textual data that had been produced by, and was available in, platforms that were open but separate."
"Perseus 6 was designed to be a publishing workflow that organizes complementary data into an integrated reading environment."
"Arguably the most important challenge that we face was to preserve and to aggregate fine-grained credits for born-digital annotations."