The key highlights and insights from the content are:
The paper describes the development of acoustic models for automatic continuous speech recognition of Swedish using hidden Markov models (HMMs) and the SpeechDat database.
The acoustic models were built at the phonetic level, allowing for general speech recognition applications, though a simplified task of digits and natural number recognition was used for model evaluation.
Different types of phone models were tested, including context-independent models and two variations of context-dependent models (within-word and cross-word context expansion).
Extensive experiments were conducted to tune the system parameters, including the number of Gaussian mixture components and the use of retroflex allophones in the lexicon.
The models were evaluated on both the development set (50 speakers) and the evaluation set (200 speakers), with the best overall accuracy of 88.6% achieved using within-word context-expanded models with 8 Gaussian mixtures.
Per-speaker analysis showed that the models performed well across different speaker characteristics, with some exceptions for speakers from certain dialect regions.
The flexibility of the models was demonstrated by testing them on the Waxholm database, which had different characteristics compared to the SpeechDat data used for training.
Further improvements were suggested, such as increasing the number of Gaussian mixture components and exploring strategies to handle stationary noise in the telephone recordings.
翻譯成其他語言
從原文內容
arxiv.org
深入探究