Improving Automatic Speech Recognition in Noisy Environments: A Case Study on Cocktail Party Speech Recognition
This paper presents an end-to-end model that combines a speech enhancement module (ConVoiFilter) and an automatic speech recognition (ASR) module to improve speech recognition performance in noisy, crowded environments. The model utilizes a single-channel speech enhancement approach to isolate the target speaker's voice from background noise and then feeds the enhanced audio into the ASR module.