A novel progressive learning pipeline that combines a lightweight speech enhancement module and a generative codec module to effectively denoise, dereverberate, and restore speech quality in challenging acoustic environments.
SSL representations have limited value in improving on-device speech enhancement systems under low-SNR conditions.
The author proposes an online SpatialNet for long-term streaming speech enhancement, utilizing variants like masked SA, Retention, and Mamba. A short-signal training plus long-signal fine-tuning strategy is introduced to improve length extrapolation ability.
The author proposes CMGAN for speech enhancement using conformer blocks and a metric discriminator to optimize evaluation scores, outperforming previous models on the Voice Bank+DEMAND dataset.
The author proposes a unified system that integrates generative and predictive decoders to enhance speech quality, demonstrating improved performance in terms of speed, convergence, and overall enhancement.