This paper introduces AutoRace, a fully automated method for evaluating reasoning chains generated by Large Language Models (LLMs), and LLM Reasoners, a unified formulation and library for diverse step-by-step reasoning algorithms. The authors conduct extensive analysis on the critical design elements that affect the performance of LLM reasoning.
EURUS, a suite of large language models optimized for reasoning, achieves state-of-the-art results on diverse benchmarks covering mathematics, code generation, and logical reasoning problems by leveraging ULTRAINTERACT, a newly-curated large-scale, high-quality alignment dataset designed for complex reasoning tasks.
Structured prompting techniques, such as Chain-of-Thought, Tree of Thoughts, and Graph of Thoughts, significantly enhance the reasoning capabilities of large language models by guiding the model's thought process through intermediate steps and structured representations.