The paper explores the use of large language models, specifically GPT-4, to generate feedback-ladders for logical errors in introductory programming assignments. A feedback-ladder consists of multiple levels of feedback, ranging from a simple correct/incorrect verdict to detailed suggestions on how to modify the code.
The key findings from the paper are:
The quality of the generated feedback is generally consistent across different programming problems, with lower-level feedback (e.g., providing a test case) being more relevant and effective than higher-level feedback (e.g., suggesting code edits).
The quality of the generated feedback is higher for low-scoring submissions compared to high-scoring submissions. This is because it is easier for the language model to identify mistakes in low-scoring submissions, while high-scoring submissions often contain subtle errors that are harder to detect.
The feedback-ladder approach can be a useful tool for teachers, as they can choose the appropriate level of feedback to show to students based on their individual needs and learning contexts.
The paper also discusses potential future work, such as conducting real-classroom studies to measure the actual learning gains of students using the generated feedback, and training dedicated language models for the task of feedback-ladder generation.
Para outro idioma
do conteúdo fonte
arxiv.org
Principais Insights Extraídos De
by Hasnain Heic... às arxiv.org 05-02-2024
https://arxiv.org/pdf/2405.00302.pdfPerguntas Mais Profundas