Polok, A., Kesiraju, S., Beneš, K., Burget, L., & ˇCernocký, J. (2024). Improving Automatic Speech Recognition with Decoder-Centric Regularisation in Encoder-Decoder Models. arXiv preprint arXiv:2410.17437.
This research paper investigates whether a simple regularization method applied to the decoder module of encoder-decoder Automatic Speech Recognition (ASR) systems can improve their robustness and generalization capabilities, particularly in out-of-domain scenarios.
The authors propose a novel method called Decoder-Centric Regularisation in Encoder-Decoder (DeCRED) architecture for ASR. This method introduces auxiliary classifiers in the intermediate layers of the decoder module during training. The researchers then evaluate the performance of DeCRED against a baseline encoder-decoder model and other state-of-the-art ASR systems like Whisper and OWSM. The evaluation includes both in-domain and out-of-domain datasets, focusing on Word Error Rate (WER) as the primary metric. Additionally, the authors analyze the internal language model of the trained models using Zero-Attention Internal Language Model (ILM) perplexity estimation to understand the impact of the proposed regularization scheme.
The study demonstrates that DeCRED, a simple yet effective regularization technique, can significantly improve the performance of encoder-decoder ASR models, particularly in challenging out-of-domain scenarios. The proposed method offers a promising avenue for building robust and adaptable ASR systems.
This research contributes to the field of ASR by introducing a novel and effective regularization technique that enhances the generalization capabilities of encoder-decoder models. The findings have practical implications for developing ASR systems that can perform reliably in real-world scenarios with diverse acoustic conditions and speaking styles.
The study primarily focuses on English language ASR and utilizes a limited computational budget, restricting the training data size and model scale. Future research could explore DeCRED's effectiveness in multilingual settings, larger datasets, and different encoder-decoder architectures. Additionally, investigating the impact of combining DeCRED with other regularization techniques could further enhance ASR performance.
Para Outro Idioma
do conteúdo original
arxiv.org
Principais Insights Extraídos De
by Alex... às arxiv.org 10-24-2024
https://arxiv.org/pdf/2410.17437.pdfPerguntas Mais Profundas