This paper presents an end-to-end model that combines a speech enhancement module (ConVoiFilter) and an automatic speech recognition (ASR) module to improve speech recognition performance in noisy, crowded environments. The model utilizes a single-channel speech enhancement approach to isolate the target speaker's voice from background noise and then feeds the enhanced audio into the ASR module.


coremsg

improving-automatic-speech-recognition-in-noisy-environments-a-case-study-on-cocktail-party-speech-recognition


Improving Automatic Speech Recognition in Noisy Environments: A Case Study on Cocktail Party Speech Recognition