Polok, A., Kesiraju, S., Beneš, K., Burget, L., & ˇCernocký, J. (2024). Improving Automatic Speech Recognition with Decoder-Centric Regularisation in Encoder-Decoder Models. arXiv preprint arXiv:2410.17437.
This research paper investigates whether a simple regularization method applied to the decoder module of encoder-decoder Automatic Speech Recognition (ASR) systems can improve their robustness and generalization capabilities, particularly in out-of-domain scenarios.
The authors propose a novel method called Decoder-Centric Regularisation in Encoder-Decoder (DeCRED) architecture for ASR. This method introduces auxiliary classifiers in the intermediate layers of the decoder module during training. The researchers then evaluate the performance of DeCRED against a baseline encoder-decoder model and other state-of-the-art ASR systems like Whisper and OWSM. The evaluation includes both in-domain and out-of-domain datasets, focusing on Word Error Rate (WER) as the primary metric. Additionally, the authors analyze the internal language model of the trained models using Zero-Attention Internal Language Model (ILM) perplexity estimation to understand the impact of the proposed regularization scheme.
The study demonstrates that DeCRED, a simple yet effective regularization technique, can significantly improve the performance of encoder-decoder ASR models, particularly in challenging out-of-domain scenarios. The proposed method offers a promising avenue for building robust and adaptable ASR systems.
This research contributes to the field of ASR by introducing a novel and effective regularization technique that enhances the generalization capabilities of encoder-decoder models. The findings have practical implications for developing ASR systems that can perform reliably in real-world scenarios with diverse acoustic conditions and speaking styles.
The study primarily focuses on English language ASR and utilizes a limited computational budget, restricting the training data size and model scale. Future research could explore DeCRED's effectiveness in multilingual settings, larger datasets, and different encoder-decoder architectures. Additionally, investigating the impact of combining DeCRED with other regularization techniques could further enhance ASR performance.
На другой язык
из исходного контента
arxiv.org
Ключевые выводы из
by Alex... в arxiv.org 10-24-2024
https://arxiv.org/pdf/2410.17437.pdfДополнительные вопросы