The paper presents an approach to building cost-efficient, robust, and linguistically diverse automatic speech recognition (ASR) systems for African-accented speech. The key insights are:
The authors propose an iterative model adaptation process that uses epistemic uncertainty-based data selection to reduce the required amount of labeled data while outperforming several high-performing ASR models.
The approach improves out-of-distribution generalization for very low-resource accents, demonstrating its viability for building generalizable ASR models in the context of accented African clinical ASR, where training datasets are predominantly scarce.
The authors investigate trends in domain selection (clinical, general, and both) across adaptation rounds, finding that the most uncertain samples from linguistically rich and diverse accents provide the best learning signal for the model.
The authors establish strong baselines for the nascent field of African clinical ASR, providing a foundation for further exploration in this research direction.
The approach is shown to be effective across different ASR model architectures and datasets, demonstrating its model and dataset agnostic nature.
toiselle kielelle
lähdeaineistosta
arxiv.org
Tärkeimmät oivallukset
by Bonaventure ... klo arxiv.org 05-07-2024
https://arxiv.org/pdf/2306.02105.pdfSyvällisempiä Kysymyksiä