Conceitos essenciais
By training language models to generate, emulate, and search through pseudo-code representations of reasoning processes, their performance on a variety of tasks, including those requiring "softer" reasoning skills, can be significantly improved.
Resumo
This research paper introduces COGEX (Code Generation and Emulated EXecution), a novel approach to enhance the reasoning capabilities of language models (LMs). The core idea is to train LMs to generate and execute pseudo-code, which includes undefined functions whose execution is emulated by the LM itself. This allows the models to leverage their knowledge to fill in the gaps and perform reasoning tasks that are not easily expressed in traditional code.
The authors propose a three-step process:
- Pseudo-Program Generation: LMs are trained to generate Python functions based on natural language instructions and optional inputs. These functions are referred to as "pseudo-programs" because they may contain undefined function calls.
- Emulated Execution: The LMs are further trained to emulate the execution of these pseudo-programs, including the undefined functions. This allows the models to use their internal knowledge to interpret and execute these functions, effectively bridging the gap between code and less-defined reasoning processes.
- Program Search (COTACS): To adapt the COGEX model to new tasks, the authors introduce COTACS (COGEX Task Adaptation via Code Search). This method searches for a single, general-purpose program that performs well on a given dataset. This program can then be applied to new instances of the task by simply changing the input arguments.
The researchers evaluated COGEX on a diverse set of reasoning tasks, including commonsense question answering, text classification, and math problem solving. Their results demonstrate that COGEX, coupled with COTACS, consistently outperforms standard in-context learning approaches, particularly in few-shot settings.
The paper also explores several ablation studies to understand the contribution of different components of COGEX. For instance, they found that using pseudo-code leads to better performance compared to only using natural language plans, especially for tasks that benefit from a programmatic approach.
The authors acknowledge that COGEX has limitations, such as the potential for inconsistent code execution and the computational overhead of emulating code execution. However, they believe that COGEX represents a significant step towards applying code generation to a broader range of reasoning problems.
Estatísticas
COGEX with COTACS outperforms baseline models on 8 out of 9 tasks using both 7B and 13B Llama-2 models.
Using only 10 training examples with COTACS still yields better performance than baseline models on most tasks.
Retaining the top 3 performing programs during COTACS search generally improves performance compared to using only the single best program.
COGEX models trained on code-based instructions show comparable performance to text-only Alpaca models on standard instruction following tasks.
COTACS outperforms fine-tuning on 4 out of 9 tasks when using 500 training examples.
Citações
"Our goal is to expand an LM’s program synthesis skills to such softer reasoning tasks."
"COGEX thus suggests a way to leverage the flexible ad-hoc reasoning abilities of LMs while encouraging programmatic reasoning via the program structure."
"Overall, this work provides a significant step in showing how code generation can be applied to a much broader class of problems than previously considered."