Core Concepts
Cross-lingual transfer learning can lead to catastrophic forgetting of previously acquired knowledge in the source language. This study compares different cross-lingual transfer strategies and fine-tuning approaches to measure and mitigate this effect.
Abstract
This study investigates the impact of different cross-lingual transfer strategies and fine-tuning approaches on the phenomenon of catastrophic forgetting in language models. The key findings are:
Intermediate training (IT), which uses languages sequentially, outperforms cross-lingual validation (CLV), which uses the target language during validation, in terms of cross-lingual transfer performance.
However, the CLV strategy better mitigates catastrophic forgetting and retains more knowledge from the source language (English) compared to the IT strategy, especially when performing multiple cross-lingual transfers.
The retention of knowledge in English is better with the CLV strategy, while for other languages and across several cross-lingual steps, the IT strategy causes less forgetting.
Adapter fine-tuning is more computationally efficient than full model fine-tuning, but the latter performs better overall.
The size of the validation set in the CLV strategy significantly impacts the performance of adapter fine-tuning, but has a minimal effect on full model fine-tuning.
The authors provide open-source cross-lingual adapters for multiple tasks in three less-resourced languages, which can be reused by other researchers.
Stats
"The cross-lingual transfer is a promising technique to solve tasks in less-resourced languages."
"LLMs are pre-trained with self-supervised learning where the idea is to learn the data distribution without explicit labels, e.g., models are asked to solve fill-a-gap tasks in natural language settings (Masked Language Modeling (MLM))."
"Conneau and Lample (2019) introduced the task of Translated Language Modelling (TLM), where masked words are predicted in two parallel sentences in different languages, improving the language alignment."
Quotes
"When we transfer knowledge for a specific task or a set of tasks from one language to another, we denote the process as cross-lingual transfer."
"A common problem in transfer learning where knowledge is transferred to another problem is catastrophic forgetting (CF) (McCloskey and Cohen, 1989; Kemker et al., 2018) where models forget previously acquired knowledge when the model is adapted to a novel task."