Core Concepts
Combining Large Language Models (LLMs) and automated reasoners in the LEMUR framework enhances program verification efficiency.
Abstract
The article introduces LEMUR, a framework that combines Large Language Models (LLMs) and automated reasoners for automated program verification. It proposes a novel methodology to leverage the reasoning capabilities of LLMs and automated reasoners to enhance the verification process. The article discusses the theoretical foundation of LEMUR, presents a practical algorithmic instantiation, and provides insights into its performance on various benchmark sets. Notably, LEMUR demonstrates promising results in solving challenging benchmarks that conventional verifiers struggle with.
Structure:
Introduction to LEMUR and its purpose.
Overview of the proposed methodology combining LLMs and automated reasoners.
Description of the theoretical foundation of LEMUR as a formal calculus.
Implementation details of the algorithmic instantiation of LEMUR.
Performance evaluation on Code2Inv benchmarks and SV-COMP benchmarks.
Comparison with existing approaches like Code2Inv and conventional verifiers.
Discussion on limitations, future research directions, and extensions.
Stats
"LEMUR is able to solve 107 problems within a 10-minute time limit."
"ESBMC alone can solve 68 problems within the same time limit."
"LEMUR(GPT4) requires an average of 7.2 proposals before solving a problem."
Quotes
"The demonstrated code-understanding capability of LLMs raises the question of whether they can be used for automated program verification."
"We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification."