Centrala begrepp
Aligning the reasoning processes in non-dominant languages with the dominant language (English) can effectively enhance the multilingual reasoning capabilities of large language models.
Sammanfattning
The paper proposes a Multilingual-Alignment-as-Preference Optimization (MAPO) framework to improve the multilingual reasoning abilities of large language models. The key idea is to align the reasoning processes in non-dominant languages with the dominant language (English) during the optimization process.
The framework consists of two stages:
- Preference Estimation: A well-trained translation model is used to estimate the alignment between the reasoning processes in non-dominant and dominant languages. The translation probability is used as the preference score, where higher scores indicate better alignment with the dominant language.
- Preference Optimization: The preference scores are then used to optimize the model's reasoning in non-dominant languages through Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO).
The experiments are conducted on three challenging multilingual reasoning benchmarks - MSVAMP, MGSM, and MNumGLUESub. The results show that MAPO can significantly improve the multilingual reasoning capabilities of various base models, achieving up to 16.2%, 6.1%, and 13.3% accuracy improvements on the three benchmarks, respectively. The improvements are particularly notable on the out-of-domain MSVAMP dataset, demonstrating the generalizability of the approach.
The analysis further confirms that the key to the performance gains is the alignment of reasoning processes across languages, as evidenced by the improved Answer Consistency Ratio (ACR) and reduced Perplexity (PPL) scores. The framework is also shown to be robust across different translation models used for preference estimation.
Statistik
The number of students who suggested adding mashed potatoes is 182.
The number of students who suggested adding bacon is 182 + 166 = 348.
Citat
"Though reasoning abilities are considered language-agnostic, existing LLMs exhibit inconsistent reasoning abilities across different languages, e.g., reasoning in the dominant language like English is superior to other languages due to the imbalance of multilingual training data."
"To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO), aiming to align the reasoning processes in other languages with the dominant language."