Core Concepts
Programmers frequently defer verifying AI-generated code suggestions, leading to significant time costs and inefficiencies in the programming workflow.
Abstract
The paper presents a novel taxonomy called CodeRec User Programming States (CUPS) to model and understand how programmers interact with AI-based code recommendation systems, such as GitHub Copilot. The key insights from the study are:
Programmers spend a significant portion of their time (over 50%) in states specific to interacting with the code recommendation system, such as verifying suggestions, crafting prompts, and deferring thought on accepted suggestions.
Analyzing the transitions between CUPS states reveals patterns in programmer behavior, such as cycles of writing new code and verifying suggestions, as well as iterating on prompts to get desired suggestions.
Programmers often defer verifying AI-generated suggestions, accepting them without fully checking the code. This leads to a higher likelihood of having to later edit the accepted suggestions.
The programmer's current CUPS state has a significant impact on the probability of accepting a suggestion. For example, when the programmer is in a "Deferring Thought" state, the acceptance rate is nearly 100%, compared to only 20% when the programmer is "Thinking about New Code to Write".
The authors argue that understanding programmer behavior through the CUPS taxonomy can inform the design of better interfaces and metrics for AI-assisted programming tools, as well as enable more accurate modeling of the time costs associated with these interactions.
Stats
The average acceptance rate of Copilot suggestions across all participants was 34%.
Programmers spent an average of 22.4% of their time in the "Thinking/Verifying Suggestion" state.
Programmers spent an average of 14.05% of their time in the "Writing New Functionality" state.
The total time spent in states specific to interacting with the code recommendation system was 51.5% of the average session duration.
Quotes
"Deference incurs verification dept, and this debt often 'catches up' with the programmer."
"If the programmer was in the 'Deferring Thought For Later' state, the probability of acceptance is 0.98 ± 0.02 compared to when a programmer is thinking about new code to write, where the probability is 0.12 ± 0.04."