The study explores the viability of using code stylometry and machine learning to differentiate between human-authored and GPT-4 generated code. By analyzing a dataset from CodeChef, the classifier outperforms baselines with an F1-score and AUC-ROC score of 0.91. The research also evaluates performance across varying levels of problem difficulty, showing promising results in distinguishing between different types of code.
The content discusses the impact of AI assistants like GitHub Copilot and ChatGPT on programming education, focusing on detecting AI-generated solutions in academic settings. It highlights concerns about academic dishonesty due to students submitting AI-generated code as their own work. The study uses a combination of layout, syntactic, lexical features along with Halstead's metrics to build a classifier that can distinguish between human-authored and GPT-4 generated code. Results show high accuracy in identifying GPT-4 generated solutions based on distinctive coding styles.
Til et andet sprog
fra kildeindhold
arxiv.org
Vigtigste indsigter udtrukket fra
by Oseremen Joy... kl. arxiv.org 03-08-2024
https://arxiv.org/pdf/2403.04013.pdfDybere Forespørgsler