Core Concepts
LLM-generated code exhibits various bug patterns, including Misinterpretation, Missing Corner Cases, and Non-Prompted Consideration, highlighting potential issues with automatic code generation.
Abstract
Large Language Models (LLMs) for code generation have gained attention, but their generated code is prone to bugs. This study examines 333 bugs from leading LLMs and identifies 10 distinctive bug patterns. These include Misinterpretations, Syntax Errors, Silly Mistakes, Prompt-biased code, Missing Corner Cases, Wrong Input Types, Hallucinated Objects, Wrong Attributes, Incomplete Generation, and Non-Prompted Considerations. The study reveals the prevalence of these bug patterns and their significance among LLM practitioners and researchers.
Stats
The identified bug patterns are Misinterpretation, Syntax Error, Silly Mistake.
Missing Corner Case is the most common bug pattern in Codex.
Hallucinated Object is a prevalent bug pattern in PanGu-Coder.
Misinterpretation is the most common bug pattern across all models.
Quotes
"No model generates the same vulnerabilities as humans."
"LLM-generated code may deceive non-experienced users without proper testing."
"Popular quality assurance techniques depend on precise characterization of faults."