核心概念
A novel self-critique pipeline that enhances both the mathematical and linguistic capabilities of large language models, eliminating the need for external supervisory models and manual annotations.
要約
The paper introduces a novel approach, called the Self-Critique pipeline, to enhance the mathematical problem-solving abilities of large language models (LLMs) without compromising their linguistic capabilities. The key components are:
-
Math-Critique Model:
- Constructs an accurate and robust evaluation model to score mathematical responses based on questions and reference answers.
- Provides explanatory analysis and a score between 1-10 for each response.
-
Rejective Fine-Tuning (RFT):
- Employs a rejection sampling technique, where responses failing to meet Math-Critique standards are discarded, and the rest undergo further fine-tuning.
- Aims to enhance the model's accuracy and consistency in mathematical responses while ensuring diversity.
-
Direct Preference Optimization (DPO):
- Directly learns from pairs of correct and incorrect answers, further refined through Math-Critique.
- Focuses on the most challenging questions from the previous RFT stage.
The authors also introduce the MATHUSEREVAL benchmark, designed to assess LLMs' capabilities in solving complex, open-ended mathematical queries relevant to real-world applications.
Experiments on the ChatGLM3-32B model show that the Self-Critique pipeline significantly enhances mathematical problem-solving abilities while maintaining and improving linguistic capabilities, outperforming LLMs that could be two times larger.
統計
The diameter of the first semicircular track is 72.6 meters.
The width of each lane is 1.25 meters.
The circumference of the first track is π × 72.6 meters.
The circumference of the second track is π × (72.6 / 2 + 1.25) × 2 meters.
The difference between the two track lengths is 7.854 meters.
引用
"Our strategy deviates from traditional RLHF by incorporating a Math-Critique model derived from the LLM itself, which evaluates its mathematical outputs."
"The Self-Critique pipeline is a weakly supervised iterative training method for enhancing mathematical abilities, originating from a single model."
"Results show that our pipeline significantly enhances the LLM's mathematical problem-solving while still improving its language ability, outperforming LLMs that could be two times larger."