核心概念
Increasing high-quality Yorùbá speech data for Text-to-Speech and Automatic Speech Recognition tasks.
統計
We curated about 23,000 text sentences.
We created about 42 hours of speech data recorded by 80 volunteers.
For ASR, we obtained a baseline word error rate (WER) of 23.8.
引用
"We introduce ÌròyìnSpeech, a new corpus influenced by the desire to increase the amount of high quality, contemporary Yorùbá speech data."
"Our TTS evaluation suggests that a high-fidelity, general domain, single-speaker Yorùbá voice is possible with as little as 5 hours of speech."