The paper introduces a new task called "Continual Event Detection from Speech" (CEDS), which aims to sequentially learn and recognize new speech event types while retaining previously learned knowledge. This task presents two key challenges: catastrophic forgetting and the disentanglement of semantic and acoustic events.
To address these challenges, the authors propose a novel method called "Double Mixture". This approach combines:
A mixture of speech experts, where each expert focuses on a specific task and the overall model dynamically adjusts the weights of each expert to maintain past knowledge during the learning process.
A mixture of memory, where the model stores mixed speech samples containing both semantic and acoustic events, and uses these samples during training to strengthen the collaboration between different speech experts and improve the model's generalization ability.
The authors conduct extensive experiments on three benchmark datasets (Speech-ACE05, Speech-MAVEN, and ESC-50) and two combined datasets (Speech Splicing and Speech Overlaying) to evaluate the performance of the proposed method. The results show that the Double Mixture approach outperforms various continual learning baselines in terms of average accuracy and forgetting rate, demonstrating its effectiveness in overcoming catastrophic forgetting and managing complex real-world speech scenarios with varying event combinations.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問