This paper explores methods for expanding the language coverage of a pre-trained multilingual ASR foundation model, Whisper, to include new low-resource languages. The key insights are:
Examining the zero-shot ASR and speech translation capabilities of the Whisper model on unseen languages, which reveals challenges in directly applying the model to low-resource languages.
Comparing efficient fine-tuning approaches, including Soft Language Code Tuning (SLCT), Soft Prompt Tuning (SPT), and Low Rank Adaptation (LoRA), to integrate new languages while preserving performance on existing languages. The results show that while direct fine-tuning achieves the best performance on the new language, it can lead to catastrophic forgetting of previous languages.
Adopting Elastic Weight Consolidation (EWC) as a regularization technique to mitigate catastrophic forgetting during fine-tuning. EWC can help maintain performance on specific target languages, but it is challenging to balance the trade-off between learning a new language and preserving performance on previous languages when there is a high overlap in the model parameters.
Analyzing the Fisher overlap between languages as an analytical tool to assess the potential for forgetting, which provides insights into the challenges of integrating new languages into the foundation model.
The findings highlight the importance of developing efficient and effective strategies for expanding the language coverage of pre-trained multilingual ASR models, especially for low-resource languages, while preserving the performance on existing languages.
To Another Language
from source content
arxiv.org
Viktige innsikter hentet fra
by Mengjie Qian... klokken arxiv.org 09-25-2024
https://arxiv.org/pdf/2407.06800.pdfDypere Spørsmål