Core Concepts
Large language models show promise in decompiling binary code, leading to the release of the first open-source LLMs for decompilation.
Abstract
Decompilation aims to restore compiled code to human-readable source code.
Large language models (LLMs) are applied to decompilation tasks.
The first open-access decompilation LLMs are released, pre-trained on 4 billion tokens of C source code.
Decompile-Eval dataset is introduced for practical program evaluation.
LLM4Decompile demonstrates improved decompilation accuracy over GPT-4.
Benchmarks emphasize evaluating decompilation models from a program semantics perspective.
LLM4Decompile models show promising results in decompiling binaries.
Ablation studies show the effectiveness of the sequence-to-sequence approach in decompilation.
Experiment results highlight the significant improvement in decompilation capabilities with LLM4Decompile models.
Limitations include the focus on C language and single functions in decompilation.
Stats
LLM4Decompile는 어셈블리 코드를 정확하게 디컴파일할 수 있는 능력을 보여줍니다.
LLM4Decompile은 GPT-4보다 50%의 개선을 보여줍니다.
LLM4Decompile-6b 버전은 프로그램 의미를 성공적으로 캡처하는 데 21% 성공합니다.
Quotes
"Large language models show promise for programming tasks, motivating their application to decompilation."
"LLM4Decompile has demonstrated the capability to accurately decompile 21% of the assembly code, which achieves a 50% improvement over GPT-4."