Enhancing Noise Robustness in Speech Emotion Recognition through Two-level Refinement Network and Speech Enhancement
A Two-level Refinement Network (TRNet) that leverages a pre-trained speech enhancement module to improve the robustness of speech emotion recognition in noisy environments, without compromising performance in clean environments.