Core Concepts
Speech technology is a powerful tool for processing and analyzing oral history recordings, enabling efficient transcription, speaker attribution, and enrichment of the interview narratives.
Abstract
The article discusses various speech technology solutions and services that can enhance oral history research. It covers the following key points:
Webservices at BAS:
BAS provides a range of multilingual speech processing web services, including channel separation, grapheme-to-phoneme conversion, automatic speech alignment, and anonymization.
The Transcription Portal is a zero-configuration service for transcribing oral history recordings, allowing users to select language, apply processing steps, and export transcripts in various formats.
Octra Backend is a software for managing transcription projects with a focus on data privacy and access control.
LINDAT for Oral Historians:
LINDAT offers a web-based automatic speech recognition (ASR) engine, UWebASR, which utilizes state-of-the-art wav2vec models fine-tuned for oral history interviews in English, Czech, Slovak, and German.
The service provides continuous speech recognition with post-processing for case restoration, sentence segmentation, and punctuation.
LINDAT is also developing an innovative approach to generate contextually relevant questions to enhance the understanding of oral history testimonies.
Do-it-yourself with Whisper:
Whisper, an open-source ASR toolkit from OpenAI, has gained popularity for its ability to handle a wide range of languages and its robustness to noise and dialects.
The article discusses the benefits of using Whisper, such as its open-source nature, improved readability of the transcripts, and the ongoing efforts to enhance its performance and accessibility.
Various initiatives, such as WhisperX, Fast-Whisper, and aTrain, are working on improving Whisper's recognition accuracy, speed, and user experience.
Remaining Challenges:
While transformer-based speech models perform exceptionally well in generating clean, punctuated transcripts, they may fall short in capturing linguistic phenomena relevant for discourse analysis, such as disfluencies and pause durations.
Speaker diarization, which attributes the text output to the interviewer and interviewee(s), is another area that requires further advancements.
The article highlights the significant progress in speech technology and its potential to enhance oral history research, while also acknowledging the ongoing challenges and the need for continued development in this field.