Core Concepts
Self-supervised pretraining enhances noise-robustness in keyword spotting models, outperforming supervised methods.
Abstract
The content explores the use of self-supervised learning for noise-robust keyword spotting models. It compares different pretraining approaches, including Data2Vec, and evaluates their impact on model robustness in noisy conditions. The study finds that pretraining and fine-tuning on clean data surpasses supervised methods in all testing conditions, especially for SNR above 5 dB. Using noisy data for pretraining, particularly with the Data2Vec-denoising approach, significantly enhances model robustness in noisy environments. The study systematically investigates various pretraining setups and model sizes, presenting results that demonstrate the effectiveness of self-supervised pretraining in improving noise-robustness.
Index:
- Abstract
- Introduction
- Methodology and Data Sets
- Experiments
- Results
- Conclusions
- References
Stats
"Models of three different sizes are pretrained using different pretraining approaches."
"Pretraining and fine-tuning on clean data is superior to supervised learning on clean data across all testing conditions."
"Using noisy data for pretraining models, especially with the Data2Vec-denoising approach, significantly enhances the robustness of KWS models in noisy conditions."
Quotes
"Pretraining and fine-tuning on clean data yields higher accuracy than supervised training on clean data in all testing conditions."
"Using noisy data for the student and clean data for the teacher in Data2Vec pretraining yields the best performing models in noisy conditions."