Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers
The author proposes an online SpatialNet for long-term streaming speech enhancement, utilizing variants like masked SA, Retention, and Mamba. A short-signal training plus long-signal fine-tuning strategy is introduced to improve length extrapolation ability.