This paper introduces a comprehensive mathematics dataset called "MathQuest", sourced from 11th and 12th standard NCERT textbooks. The dataset covers a wide range of mathematical concepts and varying levels of complexity.
The researchers conducted fine-tuning experiments with three prominent large language models: LLaMA-2, WizardMath, and MAmmoTH. The fine-tuned models were evaluated on the MathQuest dataset as well as other publicly available datasets, including GSM-8K, DeepMind, NumGLUE, and SimulEq.
The results show that among the three models, MAmmoTH-13B outperforms the others, achieving the highest level of competence in solving the presented mathematical problems. MAmmoTH-13B establishes itself as a robust and dependable benchmark for addressing NCERT mathematics problems.
The paper also discusses the limitations of the current approach, such as challenges in dealing with complex expressions involving nested brackets, and outlines plans for future research to further enhance the reasoning abilities of large language models for mathematical problem-solving.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問