Core Concepts
Aligning the output format of large language models (LLMs) to their pre-training objective and allowing them to locate and repair bugs simultaneously can significantly improve their automated program repair (APR) performance without relying on fault localization tools.
Abstract
The paper investigates a new approach to adapt LLMs for automated program repair (APR). The key insights are:
Modifying the output format of LLMs from discrete fixed hunks to the entire refined program can better align the inference objective with the training objective of decoder-only LLMs, leading to significant performance improvement in APR.
Prompting LLMs with buggy programs and corresponding artifacts (e.g., failed tests, error messages) can enable them to locate and repair bugs simultaneously without relying on fault localization tools, further enhancing their APR performance.
Based on these insights, the authors developed D4C, a straightforward prompting framework for APR. D4C can repair 180 bugs correctly in Defects4J, outperforming state-of-the-art APR methods with perfect fault localization by 10% and reducing the patch sampling number by 90%. The findings reveal that objective alignment and replacing the traditional localize-then-repair workflow with direct debugging are crucial for harnessing LLMs in APR.
Stats
The paper reports the following key statistics:
D4C can repair 180 out of 437 single-function bugs in Defects4J, outperforming state-of-the-art APR methods with perfect fault localization by 10%.
D4C only needs to sample 10 patches per bug, which is 90% fewer than the most efficient baseline (100-5000 samples).
Quotes
"Aligning the output from infilling discrete hunks to completing entire functions can better attain the training objective."
"Allowing LLM to locate and repair bugs with artifacts in a human-like manner can further improve its APR performance."