Robust and Efficient Self-Supervised Learning for Speaker and Noise-Invariant Speech Representations
R-Spin, a data-efficient domain-specific self-supervision method, learns speaker and noise-invariant speech representations by predicting discrete acoustic units while improving robustness to diverse acoustic environments.