Core Concepts
Improving the classification accuracy of logical errors in programming code by leveraging the relationships between different error types in Large Language Model prompts.
Abstract
The paper presents a comprehensive approach to classifying and augmenting logical errors in programming code using Large Language Models (LLMs). The key highlights are:
- Defining ten types of logical errors and establishing their relationships to address potential confusion and ambiguity in error classification.
- Proposing a new method for detecting logical errors using LLMs that incorporates the relationships between error types in the Chain-of-Thought and Tree-of-Thought prompts.
- Demonstrating that the classification accuracy improves by 21% when the error relationship information is included in the prompts, compared to when it is not provided.
- Introducing a methodology for generating a logical error dataset by augmenting correct code using LLMs, which can be useful for various programming-related applications.
The authors expect that this work can assist novice programmers in identifying and correcting the causes of logical errors more effectively.
Stats
The classification accuracy for GPT-3.5-turbo improved from 35% without error descriptions to 56% with error descriptions, a 21% increase.
The classification accuracy for GPT-4 with error descriptions reached 86%.
The False Positive Rate (FPR) decreased from 0.145 for GPT-3.5-turbo to 0.13 for GPT-4, indicating better error type distinction.
The augmentation process generated 111 code samples, with 49 "Right Augmentation" and 24 "Other types of logical errors".
Quotes
"Detecting such errors and developing an approach for assisting the user holds educational potential."
"Understanding error messages is crucial for effective programming learning, and use of LLM based approaches could benefit programming novices."
"We expect that our work can assist novice programmers in identifying the causes of code errors and correct them more effectively."