Core Concepts
Language models can be prompted with logical demonstrations to generate plausible explanations for reasoning tasks over knowledge bases, and constraining outputs and ensuring intermediate reasoning correctness are important for improving reasoning performance.
Abstract
This paper investigates the reasoning capabilities of language models (LMs) from a symbolic perspective. The authors curate synthetic datasets containing equivalent natural language and symbolic data pairs, where the symbolic part contains first-order logic rules and predicates from non-parametric knowledge bases (KBs). This allows for automated verification of the intermediate reasoning results.
The authors propose a method called Language Models as Logic Programmers (LMLP), which learns from demonstrations containing logic rules and corresponding examples to iteratively reason over KBs. LMLP recovers Prolog's backward chaining algorithm and supports automated verification of LMs' outputs.
The authors conduct comprehensive experiments to systematically compare LMLP with Chain-of-Thought (CoT) prompting in deductive reasoning settings. The results show that LMLP enjoys more than 25% higher accuracy than CoT on length generalization benchmarks, even with smaller model sizes.
The authors also analyze the LMLP method in depth, finding that:
Prompting with multiple examples and prompt ensembling can further boost the reasoning capability of LMLP.
LMLP can be robust to large search spaces, maintaining a success rate of over 38% even when the majority of facts in the KB are noisy.
Larger language models generally lead to better performance for LMLP.
The paper provides insights into the role of symbolic representations in improving the reasoning capabilities of language models through in-context learning.
Stats
LMLP can achieve more than 25% higher accuracy than CoT on length generalization benchmarks.
LMLP maintains a success rate of over 38% even when the majority of facts in the KB are noisy.