Core Concepts
Large Language Models (LLMs) can be effectively integrated with automated reasoners for program verification, leading to practical improvements.
Abstract
The content discusses the integration of Large Language Models (LLMs) and automated reasoners for automated program verification. It proposes a novel framework called LEMUR, which combines the abstract high-level reasoning capabilities of LLMs with the precise low-level reasoning abilities of automated reasoners. The methodology involves using LLMs to propose program invariants as sub-goals, which are then validated by automated reasoners. The article outlines the rules and procedures of the LEMUR calculus, demonstrating its soundness and efficiency through experimental evaluations on synthetic and competition benchmarks. Notably, LEMUR outperforms existing AI-powered and conventional verification tools in terms of efficiency and effectiveness.
Abstract
Introduction to the integration of Large Language Models (LLMs) and automated reasoners for program verification.
Proposal of a novel framework named LEMUR for combining LLMs' abstract reasoning with automated reasoners' precise logic.
Description of the methodology involving LLMs proposing invariants validated by automated reasoners.
Demonstration of the soundness and efficiency of the LEMUR calculus through experimental evaluations on various benchmarks.
Methodology
Proposal of a general methodology combining LLMs and automated reasoners for program verification.
Description of how LEMUR utilizes LLMs to suggest invariants validated by automated reasoners.
Overview of the proof system within the LEMUR calculus.
Discussion on strategies to instantiate and optimize the performance of LEMUR.
Results
Comparison between ESBMC, UAUTOMIZER, and different versions of GPT models within the context of Code2Inv benchmarks.
Evaluation on hard SV-COMP benchmarks showcasing superior performance by LEMUR(GPT4).
Stats
この論文はICLR 2024で発表された会議論文です。
提案されたLEMURフレームワークは、LLMと自動理由付け機を組み合わせてプログラムの検証を行うことを目的としています。
Quotes
LLMと自動理由付け機を組み合わせることで、プログラムの検証において実用的な改善がもたらされます。