Core Concepts
A novel intrinsic self-correction framework, Learning from Correctness (LECO), can significantly improve the reasoning performance of large language models across various tasks by progressively accumulating correct reasoning steps without relying on external feedback or handcrafted prompts.
Abstract
The paper introduces a novel intrinsic self-correction framework called Learning from Correctness (LECO) to enhance the multi-step reasoning capabilities of large language models (LLMs).
The key insights are:
LECO focuses on learning from correct reasoning steps, rather than learning from errors, which is the predominant approach in existing self-correction methods.
LECO utilizes a unique method to measure the confidence of each reasoning step based on generation logits, without the need for additional tokens or external tools.
Experiments on various multi-step reasoning tasks, including arithmetic, commonsense, and logical reasoning, demonstrate that LECO can significantly improve the reasoning performance of both closed-source (GPT-3.5, GPT-4) and open-source (DeepSeekMath-RL-7B) models, while also reducing token consumption compared to baseline methods.
LECO completely eliminates the need for prompt engineering, which is a common challenge in existing reasoning enhancement approaches.
The paper also includes detailed analyses, such as ablation studies, rethink analysis, and an early stop strategy, to further understand the effectiveness of the LECO framework.
Stats
Jim spends 2 hours watching TV and then decides to go to bed and reads for half as long. He does this 3 times a week.
In 4 weeks, Jim spends a total of 36 hours on TV and reading.
Quotes
"Learning from Correctness (LECO), can significantly improve the reasoning performance of large language models across various tasks by progressively accumulating correct reasoning steps without relying on external feedback or handcrafted prompts."
"LECO utilizes a unique method to measure the confidence of each reasoning step based on generation logits, without the need for additional tokens or external tools."