核心概念
Developing cost-efficient, robust, and linguistically diverse automatic speech recognition systems for African accents by leveraging epistemic uncertainty-based data selection.
要約
The paper presents an approach to building cost-efficient, robust, and linguistically diverse automatic speech recognition (ASR) systems for African-accented speech. The key insights are:
-
The authors propose an iterative model adaptation process that uses epistemic uncertainty-based data selection to reduce the required amount of labeled data while outperforming several high-performing ASR models.
-
The approach improves out-of-distribution generalization for very low-resource accents, demonstrating its viability for building generalizable ASR models in the context of accented African clinical ASR, where training datasets are predominantly scarce.
-
The authors investigate trends in domain selection (clinical, general, and both) across adaptation rounds, finding that the most uncertain samples from linguistically rich and diverse accents provide the best learning signal for the model.
-
The authors establish strong baselines for the nascent field of African clinical ASR, providing a foundation for further exploration in this research direction.
-
The approach is shown to be effective across different ASR model architectures and datasets, demonstrating its model and dataset agnostic nature.
統計
The use of speech recognition led to a 19-92% decrease in average documentation time, 50.3-100% decrease in turnaround time, and 17% improvement in documentation quality.
There is a shortage of skilled health personnel in many African countries, with a 1.55 health worker per 1000 persons ratio, lower than the WHO-recommended 4.45 health workers per 1000 persons.
The AfriSpeech-200 dataset used in the study contains 200 hours of Pan-African accented English speech, representing 13 Anglophone countries across sub-Saharan Africa and the US.
引用
"Clinical automatic speech recognition (ASR) is an active area of research (Kodish-Wachs et al., 2018; Finley et al., 2018; Zapata and Kirkedal, 2015)."
"Several studies (Blackley et al., 2019; Goss et al., 2019; Blackley et al., 2020; Ahlgrim et al., 2016; Vogel et al., 2015) showed that the use of speech recognition led to a 19-92% decrease in average documentation time, 50.3-100% decrease in turnaround time, and 17% improvement in documentation quality."
"In the African context where the patient burden is high (Oleribe et al., 2019; Naicker et al., 2009; Nkomazana et al., 2015) and staffing is inadequate (who; Ahmat et al., 2022; Naicker et al., 2010; Nkomazana et al., 2015; Kinfu et al., 2009), clinical ASR systems have great potential to reduce daily documentation burden."