The authors explore the intrinsic language-specific subspaces in fine-tuning multilingual neural machine translation (MNMT) models. They observe that the fine-tuning for a language occurs in its intrinsic language-specific subspace, which only requires a tiny fraction of the entire model parameters.
To leverage this insight, the authors propose Language-Specific LoRA (LSLo), which models the intrinsic language-specific subspaces using multiple sparsely activated LoRA modules. Furthermore, they introduce architecture learning techniques, including Weight Learning and Layer-wise Cross-Language Pruning, to determine the optimal structure and size of the intrinsic subspaces for each language.
Experiments on FLORES-101 datasets show that the size of the intrinsic subspace is highly correlated with the language's resource type. High and medium-resource languages can be fine-tuned within a very small parameter subspace, while low-resource languages require larger subspaces. By fine-tuning languages in their respective intrinsic subspaces, the proposed method outperforms full-parameter fine-tuning by up to 2.25 spBLEU scores, while reducing the trainable parameters to only 7% of the original model.
The authors also analyze the effectiveness of their approach, finding that the model's focus shifts from the source side to the target side near the top layers of the encoder, and that the fully connected layers are the most crucial for language-specific learning. Overall, this work demonstrates the potential of exploiting intrinsic language-specific subspaces to achieve efficient and effective fine-tuning of MNMT models.
翻譯成其他語言
從原文內容
arxiv.org
深入探究