toplogo
登录

ÌròyìnSpeech: Yorùbá Speech Corpus for TTS and ASR


核心概念
Increasing high-quality Yorùbá speech data for Text-to-Speech and Automatic Speech Recognition tasks.
摘要

1. Introduction

  • Lack of voice-enabled applications in African languages.
  • Efforts to build multilingual datasets for speech recognition.
  • Focus on Yorùbá language with limited contemporary speech research.

2. The Yorùbá Language

  • Description of the Yorùbá language and its characteristics.

3. The ÌròyìnSpeech Corpus

  • Preparation of text sentences from news and fictional texts.
  • Recording of text sentences for ASR and TTS.
  • Post-production processes to ensure quality.

4. Speech Synthesis Experiments

  • Training speech synthesis models with different approaches.
  • Evaluation of models with subjective and objective metrics.
  • Impact of training with diacritics on synthesized voices.

5. Automatic Speech Recognition Experiments

  • Training baseline ASR models and exploring different approaches.
  • Comparison of end-to-end Conformer model and finetuned wav2vec 2.0.

6. Conclusion

  • Presentation of the Yorùbá speech dataset for research.
  • Findings on speech synthesis and ASR experiments.

7. Ethics Statement

  • Consent of volunteers and privacy considerations.

8. Acknowledgements

  • Recognition of support and funding for the project.

9. Bibliographical References

  • List of references for further reading.
edit_icon

自定义摘要

edit_icon

使用 AI 改写

edit_icon

生成参考文献

translate_icon

翻译原文

visual_icon

生成思维导图

visit_icon

访问来源

统计
We curated about 23,000 text sentences. We created about 42 hours of speech data recorded by 80 volunteers. For ASR, we obtained a baseline word error rate (WER) of 23.8.
引用
"We introduce ÌròyìnSpeech, a new corpus influenced by the desire to increase the amount of high quality, contemporary Yorùbá speech data." "Our TTS evaluation suggests that a high-fidelity, general domain, single-speaker Yorùbá voice is possible with as little as 5 hours of speech."

从中提取的关键见解

by Tolulope Ogu... arxiv.org 03-28-2024

https://arxiv.org/pdf/2307.16071.pdf
ÌròyìnSpeech

更深入的查询

How can the ÌròyìnSpeech dataset be utilized beyond TTS and ASR applications?

The ÌròyìnSpeech dataset can be utilized beyond Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) applications in various ways. One significant application is in linguistic research and analysis. Researchers can use the dataset to study the phonetics, tonality, and linguistic nuances of the Yorùbá language. It can also be used for language learning and education purposes, providing authentic audio samples for students to practice listening and pronunciation. Additionally, the dataset can support the development of language-related technologies such as language identification, sentiment analysis, and emotion recognition in Yorùbá speech. Cultural preservation projects could leverage the dataset to create interactive language learning tools, digital archives of oral traditions, and multimedia experiences that showcase the richness of the Yorùbá language and culture.

What are the potential drawbacks of relying on crowd-sourced recordings for speech data?

While crowd-sourced recordings offer many benefits, such as scalability, diversity, and cost-effectiveness, there are potential drawbacks to consider. One major concern is the quality control of the recordings. Since crowd-sourced data relies on contributions from volunteers, there may be inconsistencies in recording quality, background noise, accent variations, and pronunciation accuracy. Ensuring the accuracy and reliability of the data can be challenging, especially when dealing with languages with specific tonal or phonetic requirements like Yorùbá. Another drawback is the potential lack of linguistic expertise among the volunteers, which could lead to mispronunciations, errors in intonation, or improper emphasis on certain words or phrases. Additionally, managing the ethical considerations of using crowd-sourced data, such as ensuring informed consent, protecting user privacy, and handling sensitive information, can pose challenges for researchers and project managers.

How can the findings of this research contribute to preserving and promoting the Yorùbá language and culture?

The findings of this research can contribute significantly to preserving and promoting the Yorùbá language and culture in several ways. By creating a high-quality speech corpus like ÌròyìnSpeech, researchers and developers can advance the development of language technologies that support Yorùbá, making it more accessible in digital environments. This can help bridge the gap between traditional oral communication and modern digital platforms, preserving the language for future generations. Furthermore, the research outcomes can raise awareness about the importance of linguistic diversity and cultural heritage, encouraging initiatives to protect and celebrate the Yorùbá language. The availability of resources like the ÌròyìnSpeech dataset can empower Yorùbá speakers to engage with their language in new ways, fostering a sense of pride and identity in their linguistic and cultural heritage.
0
star