Core Concepts
A novel planning-based approach, called Plan of Thoughts (PoT), that leverages a large language model's self-reflective reasoning capabilities to guide multi-step problem solving through a Partially Observable Markov Decision Process (POMDP) formulation.
Abstract
The paper presents a novel approach called Plan of Thoughts (PoT) that extends the previous Tree of Thoughts (ToT) method to enable more effective multi-step reasoning with large language models.
The key contributions are:
Formalizing the process of "thinking" and "reasoning" with a language model as a POMDP, where the language model's own reflections about the value of a state are used as a search heuristic.
Leveraging the online POMCP solver, PoT demonstrates superior success rate of 89.4% on the Game of 24 task compared to existing approaches, while also offering better anytime performance characteristics.
The PoT approach works as follows:
The problem is represented as a POMDP with states S = S × U, where s ∈ S are the sub-steps of the decomposed problem and u ∈ U represent the unobservable, true underlying value being estimated.
The actions A = {continue, rollback, think} allow the agent to expand the current branch, backtrack, or generate a new thought.
The observations O ∈ S × U are the language model's judgments about the value of a given state, obtained by asking the model to rate the state as "sure", "likely", or "impossible".
The transition function T and reward function R are defined based on the problem structure and language model evaluations.
By solving this POMDP using the anytime POMCP solver, PoT is able to dynamically explore the most promising paths while maintaining strong anytime performance, outperforming previous approaches like ToT and Chain-of-Thought on the Game of 24 task.
The authors also demonstrate a hybrid language modeling approach, using GPT-3.5-Turbo-Instruct for thought generation and GPT-4-Turbo for evaluation, which allows for more efficient problem solving.
Stats
The Game of 24 task involves combining four numbers using basic arithmetic operations to reach a target value of 24.
The dataset used is the same as the one curated by Yao et al. for the ToT approach, scraped from 4nums.com and sorted by difficulty.
Quotes
"By combining explicit planning and LM reasoning, this approach achieved state-of-the-art results on the Game of 24 and other difficult natural-language tasks such as a crossword."
"Importantly, the anytime property of the approach presented here indicates that PoT can scale its solution time dynamically to the number of 'thoughts' required for each problem as shown by the AUC value of 81.4% of our approach using time as a threshold, while ToT will take a fix-time approach to generate and evaluate an entire tree regardless of solution complexity."