核心概念
Transformer models can be trained to imitate the search dynamics of the A* algorithm and then fine-tuned to discover more efficient search strategies for solving complex planning tasks.
摘要
The authors demonstrate how to train Transformer models to solve complex planning tasks by imitating the search dynamics of the A* algorithm. They generate synthetic datasets that capture the step-by-step execution trace of A* search, in addition to the optimal plan.
The key insights are:
- Training Transformer models on the search execution traces, in addition to the optimal plans, significantly boosts their performance, especially in the low data regime.
- The search-augmented Transformer models outperform larger solution-only models that directly predict the optimal plan, highlighting the importance of capturing the reasoning process.
- The authors further improve the search-augmented models through a "search dynamics bootstrapping" method, where the model is fine-tuned to discover more efficient search strategies that require fewer steps to find the optimal plan.
- Experiments on maze navigation and Sokoban puzzle tasks show that the final Searchformer model can solve 93.7% of test tasks while using 26.8% fewer search steps compared to the original A* implementation.
统计
The Searchformer model generates execution traces that are on average 26.8% shorter than the traces generated by the A* algorithm.
The Searchformer model solves 93.7% of the test Sokoban tasks.
引用
"Transformer-based architectures (Vaswani et al., 2017) have demonstrated impressive performance in different tasks, including holding conversations at the human level (Shuster et al., 2022; OpenAI, 2022, 2023; Touvron et al., 2023), high-quality image understanding (Caron et al., 2021; Oquab et al., 2024; Assran et al., 2023) and video generation (Singer et al., 2023), multi-modal generation (Girdhar et al., 2023; Radford et al., 2021), and code completion (Roziere et al., 2023; OpenAI, 2021)."
"Despite these successes, Transformer-based architectures and LLMs still struggle when it comes to solving planning and reasoning tasks. Previous studies demonstrate that LLMs fall short in multi-step planning tasks (Valmeekam et al., 2023a,b) or when performing higher-order reasoning (Momennejad et al., 2023; Fan et al., 2020)."