This paper presents a significant advancement in Kurdish text-to-speech (TTS) technology by introducing the first TTS vocoder based on a 21-hour Kurdish speech corpus. The researchers adapted the WaveGlow deep learning architecture to the Kurdish language, optimizing it for the unique acoustic properties of Kurdish to ensure clear and natural speech output.
The study begins by discussing the challenges in developing high-quality TTS systems for low-resource languages like Kurdish, which lacks linguistic information and dedicated resources. The researchers utilized the existing "Sabat Speech Corpus" containing 10,979 utterances across diverse categories to train the Kurdish WaveGlow vocoder from scratch, without relying on any pre-trained models.
The paper then provides an overview of the Tacotron2 TTS model and the WaveGlow vocoder architecture. WaveGlow employs a series of invertible transformations, known as normalizing flows, to map the Mel spectrogram to the complex distribution of the audio waveform, enabling high-quality speech synthesis.
The researchers conducted extensive experiments, training the Kurdish WaveGlow model for 120 hours across 5 days. The model demonstrated steady convergence, indicating effective learning of the acoustic properties and linguistic nuances of Kurdish speech.
To evaluate the performance, the researchers selected 110 random sentences from various categories and conducted a Mean Opinion Score (MOS) assessment with 12 native Kurdish speakers. The results show that the Kurdish Tacotron2-Scratch (WaveGlow Kurdish-Scratch) model significantly outperformed the models using English pre-trained WaveGlow, achieving an impressive MOS of 4.91, which sets a new benchmark for Kurdish speech synthesis.
The paper concludes by highlighting the groundbreaking contributions of this work, including the introduction of the first Kurdish-specific TTS vocoder and the successful adaptation of the WaveGlow architecture to the Kurdish language. The researchers emphasize that these advancements not only enhance Kurdish TTS but also offer scalable methodologies that can be applied to other Kurdish dialects and low-resource languages, broadening the impact of this work across different linguistic communities.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문