Large language models (LLMs) can synthesize and translate code, but their abilities to reason about code execution and specification are limited, especially for complex programs. LLMs struggle to correctly predict program outputs, understand control flow, and implement specified behavior.


coremsg

can-large-language-models-effectively-reason-about-code-execution-and-specification-


Can Large Language Models Effectively Reason About Code Execution and Specification?