This paper describes the development of an industrial-scale, multilingual automatic speech recognition (ASR) system that leverages a diverse training dataset and a robust model architecture to achieve competitive performance and practical advantages over state-of-the-art open-source models.
Efficiently integrating new low-resource languages into a pre-trained multilingual automatic speech recognition (ASR) foundation model while maintaining performance on existing languages.
This research paper introduces csvMASR, a novel configurable multilingual automatic speech recognition (MASR) model that leverages speech summary vector representations and adapter modules to achieve improved performance and configurability compared to existing MASR models.
A two-stage transliteration approach, projecting graphemes from multiple languages to a common script (Devanagari), significantly improves the performance of end-to-end multilingual Automatic Speech Recognition (ASR) systems by reducing speech-class confusion.