Główne pojęcia
Peer-aided Repairer (PaR) is a novel framework that empowers large language models to effectively repair bugs in advanced student programming assignments by leveraging peer solutions and a multi-source prompt generation approach.
Streszczenie
The key highlights and insights from the content are:
-
The authors curated a new dataset called Defects4DS, which contains 682 submissions from 4 programming assignments of a higher-level programming course. The dataset features programs with increased complexity, longer lengths, and a variety of structures compared to introductory programming assignment datasets.
-
The authors analyzed the characteristics of the Defects4DS dataset and compared it to the ITSP dataset, a widely used introductory programming assignment dataset. The analysis revealed that the bugs in Defects4DS are more challenging to locate and fix due to the presence of complex grammatical components, related bugs, and a higher proportion of variable-related bugs.
-
To address the challenges in repairing advanced student assignments, the authors proposed the Peer-aided Repairer (PaR) framework. PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair.
- Peer Solution Selection identifies the closely related peer programs based on lexical, semantic, and syntactic criteria.
- Multi-Source Prompt Generation adeptly combines multiple sources of information, including the peer solution, program description, I/O-related information, and buggy code, to create a comprehensive and informative prompt for the Program Repair stage.
- The Program Repair stage feeds the generated prompt to a large language model, which then produces the fixed code.
-
The evaluation on Defects4DS and the ITSP dataset shows that PaR achieves a new state-of-the-art performance, demonstrating impressive improvements of 19.94% and 15.2% in repair rate compared to prior state-of-the-art LLM- and symbolic-based approaches, respectively.
Statystyki
The authors report the following key statistics:
The average and median number of lines of code in Defects4DS is 55 and 78, respectively, much higher than the 22 average and 20 median in the ITSP dataset.
38.6% of the Defects4DS programs contain complex grammatical components (struct, pointer, multi-dimensional array), while none are present in the ITSP dataset.
42.7% of the Defects4DS programs contain custom functions, compared to 20.5% in the ITSP dataset.
Cytaty
"Automated Program Repair (APR) techniques can automatically generate patches to correct code errors by reasoning about the code semantics based on the given specification."
"Recent advancements in the development of Large Language Models (LLMs) provide an alternative solution for bug repair that does not necessitate experts with program analysis/repair experience."