Основні поняття
AI-powered programming assistants can enhance task completion rates, reduce time, improve code quality, and increase self-perceived productivity, but their impact varies across different tasks and user experience levels.
Анотація
The study simulates real software development scenarios and conducts a controlled human study with 27 computer science students to comprehensively evaluate the effectiveness and usability of three popular AI coding assistant tools (ACATs): GitHub Copilot, Tabnine, and CodeGeeX.
The key findings include:
ACATs generally increase task completion rates, resulting in a general reduction in task completion time. However, for highly experienced users, ACATs may even increase completion time.
ACATs enhance code quality, with GitHub Copilot demonstrating the most outstanding performance. The "Management System Development" task is most affected by ACATs across multiple dimensions.
The acceptance rate of ACATs' code recommendations varies, peaking at 40% for the "Management System Development" task. CodeGeeX has the highest acceptance rate at 49%, while Tabnine has the lowest at 24%.
Over 55% of participants feel programming more strenuous without ACAT's help, indicating ACATs improve participants' self-perceived productivity.
Regarding recommended code characteristics, "edited line completion" is the most frequent recommendation, while "comments completion" and "string completion" have the lowest acceptance rates.
Participants' decisions to accept, reject, or modify ACAT recommendations are influenced by 22 factors, including fulfilling functional requirements, reducing keystrokes, and addressing logical errors.
Participants face challenges with ACATs, such as poor performance on complex logic, limited natural language understanding, and varying preferences on code recommendation length. They also express 23 expectations, including enhanced natural language interaction, optimized ranking of suggestions, and adaptive learning of personal coding styles.
The findings provide valuable insights to inform the further improvement of AI-powered programming assistants and enhance their support for developers.
Статистика
"ACATs generally enhance task completion rates, resulting in a general reduction in task completion time."
"For highly experienced users, ACATs may even increase completion time."
"GitHub Copilot brings the largest code quality score difference (1.61, ↑54%)."
"The acceptance rate of ACATs' code recommendations peaks at 40% for the 'Management System Development' task."
"Over 55% of participants feel programming more strenuous without ACAT's help."
Цитати
"I accept this because it's a long code, it even has most of the structure already written for me, which saved me a lot of time."
"ACAT misunderstood his intention, as he intended to complete the code based on his comments, but the ACAT instead completed his comments."