The authors propose a computational approach to automatically cluster the running titles found at the top of pages in early modern printed books. The goal is to detect variations in the underlying skeleton formes used during the printing process, as the running titles were often reused across pages printed with the same forme.
The authors design two custom kernel functions to compute the visual similarity between running titles - a Levenshtein-based kernel that captures differences in character spacing and shape, and a neural cross-encoder Vision Transformer-based kernel trained on synthetic data. They then use spectral clustering on the similarity matrices to group running titles printed with the same skeleton forme.
The authors evaluate their approach on a newly introduced dataset of 8 early modern books across different formats, with ground truth annotations of the skeleton forme clusters provided by expert bibliographers. They find that their domain-informed Levenshtein kernel significantly outperforms the neural approach, and that leveraging the book's gathering structure to aggregate similarities across sheet sides greatly improves performance compared to using individual pages or recto pages alone.
The authors also provide a detailed qualitative analysis of their results on the Leviathan book, showing how their automated clustering can uncover important bibliographic insights about the printing process, such as the use of multiple presses, pauses in printing, and cancellations.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Nikolai Vogl... at arxiv.org 05-03-2024
https://arxiv.org/pdf/2405.00752.pdfDeeper Inquiries