The paper presents an approach to building cost-efficient, robust, and linguistically diverse automatic speech recognition (ASR) systems for African-accented speech. The key insights are:
The authors propose an iterative model adaptation process that uses epistemic uncertainty-based data selection to reduce the required amount of labeled data while outperforming several high-performing ASR models.
The approach improves out-of-distribution generalization for very low-resource accents, demonstrating its viability for building generalizable ASR models in the context of accented African clinical ASR, where training datasets are predominantly scarce.
The authors investigate trends in domain selection (clinical, general, and both) across adaptation rounds, finding that the most uncertain samples from linguistically rich and diverse accents provide the best learning signal for the model.
The authors establish strong baselines for the nascent field of African clinical ASR, providing a foundation for further exploration in this research direction.
The approach is shown to be effective across different ASR model architectures and datasets, demonstrating its model and dataset agnostic nature.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Bonaventure ... at arxiv.org 05-07-2024
https://arxiv.org/pdf/2306.02105.pdfDeeper Inquiries