The content explores the potential for large language models (LLMs) to learn from their mistakes and enhance their reasoning abilities. The authors introduce COTERRORSET, a novel benchmark that collects both correct and incorrect Chain-of-Thought (CoT) rationales across various domains, along with demonstrations of why the errors were made.
The authors propose two methods to leverage these mistakes:
Self-rethinking: This approach guides LLMs to rethink their initial responses and identify if they have made similar errors in the past. If errors are detected, the model is prompted to correct its reasoning and provide a new answer.
Mistake tuning: This method fine-tunes LLMs on a combination of correct and incorrect rationales, with prefixes to distinguish between them. This allows the models to better differentiate between correct and incorrect reasoning.
Experiments on various benchmarks, including arithmetic and commonsense reasoning tasks, demonstrate that both self-rethinking and mistake tuning can consistently improve the performance of LLMs compared to existing approaches. The authors also provide a detailed analysis of the common error types exhibited by LLMs, offering insights to guide future research in mitigating these issues.
To Another Language
from source content
arxiv.org
Djupare frågor