핵심 개념
Decomposing the reasoning process of language models into two steps - THINK to discover task-level logic expressed in pseudocode, and EXECUTE to tailor the pseudocode and simulate its execution - can significantly improve their performance on various algorithmic reasoning tasks.
초록
The paper introduces THINK-AND-EXECUTE, a novel framework for improving the algorithmic reasoning capabilities of large language models (LLMs).
THINK phase:
- The Instructor LM analyzes a few example instances of a given task and generates a pseudocode prompt that captures the underlying logic required to solve the task.
- The pseudocode prompt is designed to be task-level, meaning it can be applied to solve any instance of the task, rather than being instance-specific.
EXECUTE phase:
- The Reasoner LM takes the generated pseudocode prompt and the input instance, and simulates the execution of the pseudocode step-by-step.
- The Reasoner LM outputs the intermediate reasoning steps as a chain-of-thought, and the final answer.
The authors demonstrate that THINK-AND-EXECUTE outperforms strong baselines like Chain-of-Thought and Program-of-Thought prompting across various algorithmic reasoning tasks from the Big-Bench Hard benchmark. They also show that the pseudocode prompt generated by an LLM can be effectively transferred to smaller language models, improving their reasoning abilities.
The key insights are:
- Discovering task-level logic and expressing it in pseudocode is more effective than generating instance-specific code.
- Simulating the execution of pseudocode helps guide the reasoning of LLMs better than natural language plans.
- LLMs pre-trained on code corpora can better understand and apply the task-level pseudocode prompt.
통계
"Algorithmic reasoning has long been a formidable challenge for LLMs, as it requires to scrutinize a complicated reasoning pattern and to translate it into a long sequence of reasoning steps."
"We demonstrate that THINK-AND-EXECUTE achieves notable improvements over strong baselines, including Chain-of-Thought and Program-of-Thought prompting, across various algorithmic tasks in Big-Bench Hard."
"We empirically show that the pseudocode prompt discovered by an LLM can be applied to small LMs (SLMs), such as CodeLlama-7B, to boost their reasoning ability."
인용구
"To improve the reasoning capabilities of LLMs, prior works have primarily pursued two directions. The first direction includes enhancing the reasoning execution step by generating a rationale in natural language (e.g., Chain-of-Thought (Wei et al., 2022; Kojima et al., 2022)) or a piece of code (e.g., Program-of-Thought (Chen et al., 2023), Program-Aided LMs (Gao et al., 2023))."
"The second direction involves explicitly generating a plan described in natural language with LLMs. The plan describes the logic of the task and the LLM would subsequently concretize it into a sequence of reasoning steps (e.g., Least-to-Most (Zhou et al., 2022b), Plan-and-Solve (Wang et al., 2023))."