The paper explores how to extend the two-stage training framework of question alignment and response alignment to diverse reasoning scenarios beyond mathematical reasoning with chain-of-thought. The key findings are:
The question alignment approach can be applied broadly to boost multilingual performance across various reasoning tasks, model families, and sizes. For instance, the fine-tuned LLaMA2-70B model achieves 63.0% average accuracy on the MGSM benchmark, a new performance ceiling for open-source models.
Incorporating En-X translation data during the response alignment stage can implicitly encourage the model to generate non-English chain-of-thought, improving question-response language consistency, though this comes at the cost of some reasoning accuracy.
The question alignment approach scales well to extremely large language models, and efficient proxy-tuning can achieve nearly the same performance as full fine-tuning without updating any parameters.
Analysis reveals that question alignment produces a more unified semantic space, facilitating the model's ability to leverage its English expertise in non-English contexts. The model also employs more consistent problem-solving processes across languages after question alignment.
The size of the question translation data is an important factor, with low-resource languages benefiting more from scaling up the data.
לשפה אחרת
מתוכן המקור
arxiv.org
תובנות מפתח מזוקקות מ:
by Wenhao Zhu,S... ב- arxiv.org 05-03-2024
https://arxiv.org/pdf/2405.01345.pdfשאלות מעמיקות