Основные понятия
The author explores the quality of feedback generated by GPT-4 Turbo for programming exercises, highlighting improvements and limitations compared to previous models.
Аннотация
The study evaluates the feedback quality of GPT-4 Turbo for programming exercises, noting improvements in structure and correctness. The feedback is personalized and detailed, providing suggestions for optimization and coding style. However, inconsistencies, redundancies, and misleading information were also identified. Future research should focus on pedagogical integration and privacy concerns.
Статистика
Large Language Models (LLMs) such as Codex, GPT-3.5, and GPT 4 have shown promising results in large programming courses.
GPT-4 was asked to generate feedback for 55 student submissions from an introductory programming course.
Compared to prior work with GPT-3.5, GPT-4 Turbo shows notable improvements in structured output.
In some cases, the feedback includes the output of the student program.
The accuracy of feedback seems to improve when the model receives task instructions as input.