The content explores the use of self-supervised learning for noise-robust keyword spotting models. It compares different pretraining approaches, including Data2Vec, and evaluates their impact on model robustness in noisy conditions. The study finds that pretraining and fine-tuning on clean data surpasses supervised methods in all testing conditions, especially for SNR above 5 dB. Using noisy data for pretraining, particularly with the Data2Vec-denoising approach, significantly enhances model robustness in noisy environments. The study systematically investigates various pretraining setups and model sizes, presenting results that demonstrate the effectiveness of self-supervised pretraining in improving noise-robustness.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Jaco... alle arxiv.org 03-28-2024
https://arxiv.org/pdf/2403.18560.pdfDomande più approfondite