Bhandari, N., Chen, D., del Río Fernández, M. A., Delworth, N., Drexler Fox, J., Jettè, M., ... & Robichaud, J. (2024). Reverb: Open-Source ASR and Diarization from Rev. arXiv preprint arXiv:2410.03930.
This paper introduces Reverb, an open-source release of Rev's automatic speech recognition (ASR) and diarization models, aiming to advance research and innovation in voice technology.
Reverb ASR, based on the WeNet framework, was trained on 200,000 hours of human-transcribed English speech, the largest corpus used for an open-source model. It utilizes a joint CTC/attention architecture and offers verbatimicity control. Reverb diarization models, built on the Pyannote framework, were fine-tuned on 26,000 hours of expertly labeled data.
Reverb provides highly accurate and efficient ASR and diarization capabilities, surpassing open-source alternatives in long-form speech recognition tasks. The release encourages further research and development in voice technology by providing access to robust and adaptable models.
Reverb's open-source release significantly impacts the field of speech recognition by providing researchers and developers with access to high-performing models trained on an unprecedented scale of data. This fosters innovation and allows for the development of new applications and advancements in voice technology.
While excelling in long-form speech, Reverb's performance on short-form tasks like voice search requires further investigation. Exploring model optimization for diverse audio lengths and expanding language support are potential avenues for future research.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問