The study explores how to enhance the zero-shot performance of multilingual large language models (MLLMs) in non-English languages by leveraging their alignment capability between English and non-English languages.
The key findings are:
Specific features exhibit large magnitudes and are predominantly active only when inputting few-shot translation demonstrations. These large magnitude features are relevant for the translation performance of MLLMs.
Pruning MLLMs (XGLM and mGPT) by retaining weights associated with the large magnitude features from translation demonstrations improves their zero-shot performance in non-English languages compared to the original unpruned models. However, this pruning strategy did not improve the performance of BLOOM.
BLOOM was trained on both multilingual natural language and programming language texts, giving it the capability to generate programming language. To address this, the pruning metric was reformulated to selectively prune weights associated with features activated during programming language generation. This improved BLOOM's multilingual zero-shot learning performance.
The pruned models demonstrated higher cross-lingual consistency between English and non-English languages, indicating they are better able to leverage English inference capabilities for non-English tasks.
Overall, the study shows that selectively pruning weights based on the large magnitude features from translation demonstrations can enhance the multilingual zero-shot performance of large language models by accentuating their alignment capability between languages.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Hwichan Kim,... a las arxiv.org 09-26-2024
https://arxiv.org/pdf/2409.16911.pdfConsultas más profundas