The content presents MAGICORE, a framework for improving Large Language Model (LLM) reasoning through adaptive coarse-to-fine refinement. The key insights are:
Excessive refinement: Uniformly refining all instances can cause over-correction and reduce overall performance. MAGICORE avoids this by categorizing problems as easy or hard, solving easy problems with coarse-grained aggregation and hard ones with fine-grained, iterative multi-agent refinement.
Inability to localize and address errors: LLMs struggle to identify their own mistakes and correct them in a targeted way. MAGICORE incorporates external step-wise reward model (RM) scores to enhance error localization and generate targeted feedback.
Insufficient refinement: Deciding how many iterations of refinement are needed is non-trivial. MAGICORE employs a multi-agent loop with three agents (Solver, Reviewer, Refiner) and makes the communication between the Reviewer and Refiner agents bidirectional to ensure effective and sufficient refinement.
MAGICORE is evaluated on Llama-3-8B and GPT-3.5 across five math reasoning datasets. It consistently outperforms aggregation-based methods like Best-of-k and Self-Consistency, as well as refinement-based methods like Self-Refine, while using fewer samples. The results highlight the importance of MAGICORE's selective refinement, use of RMs, and multi-agent communication.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Justin Chih-... a las arxiv.org 09-19-2024
https://arxiv.org/pdf/2409.12147.pdfConsultas más profundas