The paper investigates the fact selection problem for LLM-based automated program repair (APR). It conducts a large-scale study using over 19K prompts with various combinations of seven diverse facts to repair 314 bugs from open-source Python projects.
The key findings are:
Each fact, ranging from simple syntactic details like code context to semantic information like angelic values, helps fix some bugs that would remain unresolved or only be fixed with a low success rate without it.
The effectiveness of program repair prompts is non-monotonic over the number of used facts; using too many facts leads to subpar outcomes. This is likely due to LLMs not robustly making use of information in long input contexts and being negatively impacted by irrelevant information.
There is no universal set of facts that is sufficiently effective, compared to other sets, on all subsets of the bugs. The authors developed a statistical model called MANIPLE that selects facts contingent upon the features of a specific bug, significantly outperforming a universal fact selection methodology.
The authors benchmarked MANIPLE against state-of-the-art zero-shot, non-conversational LLM-based bug repair methods. On their testing set, MANIPLE repaired 17% more bugs, highlighting the practical impact of the fact selection problem.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Nikhil Paras... at arxiv.org 04-09-2024
https://arxiv.org/pdf/2404.05520.pdfDeeper Inquiries