Основные понятия
Entropy from large language models can be effectively used to complement prior automated program repair techniques for fault localization, patch generation efficiency, and patch correctness assessment.
Аннотация
The paper explores the use of entropy from large language models (LLMs) to improve various stages of automated program repair (APR):
Fault Localization:
- Integrating entropy scores from LLMs like InCoder, Starcoder, and Code-Llama2 with prior fault localization techniques like SBFL, TransferFL, and LLMAO.
- Entropy-based re-ranking of suspicious lines identified by these tools can significantly improve their fault localization accuracy, especially for SBFL.
Patch Generation Efficiency:
- Introducing "entropy-delta" to measure the change in naturalness between original buggy code and proposed patches.
- Using entropy-delta to rank patches before running tests can reduce the number of patches that need to be evaluated by 24 on average.
- Incorporating entropy-delta into the TBar template-based repair technique improves its efficiency across multiple projects.
Patch Correctness Assessment:
- Analyzing the ability of entropy-delta to distinguish between correct and plausible but incorrect patches.
- Entropy-delta ranks 49% more correct patches in the Top-1 position compared to the state-of-the-art Shibboleth patch ranker.
- Entropy-delta also outperforms Panther, the state-of-the-art patch classifier, by 18% in precision and 10% in F1 score.
The results demonstrate that entropy from LLMs can effectively complement prior APR techniques, improving fault localization, patch generation efficiency, and patch correctness assessment, while minimizing dependencies on test suites.
Статистика
SBFL assigns the same suspiciousness score to 1137 lines of code on average per bug in Defects4J.
TransferFL assigns the same suspiciousness score to 380 lines of code on average per bug in Defects4J.
Entropy-delta reduces the median number of patches tested before finding a fix across all Defects4J projects except Mockito.
Entropy-delta improves the Top-1 patch ranking by 49% and the Top-2 ranking by 27% compared to the state-of-the-art Shibboleth patch ranker.
Цитаты
"Entropy can be used to rank patches before going through the entire test-suite, thereby reducing the test overhead for template-based repair technique TBar by a mean of 24 patches tested."
"Correct patches tend to lower entropy (i.e., increase naturalness) more than incorrect patches."