The content explores the use of self-supervised learning for noise-robust keyword spotting models. It compares different pretraining approaches, including Data2Vec, and evaluates their impact on model robustness in noisy conditions. The study finds that pretraining and fine-tuning on clean data surpasses supervised methods in all testing conditions, especially for SNR above 5 dB. Using noisy data for pretraining, particularly with the Data2Vec-denoising approach, significantly enhances model robustness in noisy environments. The study systematically investigates various pretraining setups and model sizes, presenting results that demonstrate the effectiveness of self-supervised pretraining in improving noise-robustness.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Jaco... lúc arxiv.org 03-28-2024
https://arxiv.org/pdf/2403.18560.pdfYêu cầu sâu hơn