The paper proposes a Multilingual-Alignment-as-Preference Optimization (MAPO) framework to improve the multilingual reasoning abilities of large language models. The key idea is to align the reasoning processes in non-dominant languages with the dominant language (English) during the optimization process.
The framework consists of two stages:
The experiments are conducted on three challenging multilingual reasoning benchmarks - MSVAMP, MGSM, and MNumGLUESub. The results show that MAPO can significantly improve the multilingual reasoning capabilities of various base models, achieving up to 16.2%, 6.1%, and 13.3% accuracy improvements on the three benchmarks, respectively. The improvements are particularly notable on the out-of-domain MSVAMP dataset, demonstrating the generalizability of the approach.
The analysis further confirms that the key to the performance gains is the alignment of reasoning processes across languages, as evidenced by the improved Answer Consistency Ratio (ACR) and reduced Perplexity (PPL) scores. The framework is also shown to be robust across different translation models used for preference estimation.
To Another Language
from source content
arxiv.org
Ключові висновки, отримані з
by Shuaijie She... о arxiv.org 04-16-2024
https://arxiv.org/pdf/2401.06838.pdfГлибші Запити