Bibliographic Information: Li, M., & Krishnamachari, B. (2024). Evaluating ChatGPT-3.5 Efficiency in Solving Coding Problems of Different Complexity Levels: An Empirical Analysis. arXiv preprint arXiv:2411.07529v1.
Research Objective: This paper investigates the effectiveness of ChatGPT, primarily the GPT-3.5-turbo model, in solving coding problems of varying difficulty levels on the LeetCode platform. The research explores the impact of prompt engineering techniques and examines the model's performance across different programming languages.
Methodology: The researchers used a dataset of 1,475 LeetCode problems categorized into easy, medium, and hard levels. They developed Python scripts to automate interactions with the ChatGPT API, submitting prompts and collecting responses. The correctness of the generated code was evaluated based on LeetCode's integrated compiler and test cases. Prompt engineering techniques, including chain-of-thought prompting and incorporating failed test cases, were employed to assess their impact on performance. Additionally, the study compared the performance of GPT-3.5-turbo with GPT-4, Claude 3 Sonnet, and Gemini 1.0 Pro. The model's proficiency was also evaluated across various programming languages, including Python, C++, Java, Elixir, Erlang, and Racket.
Key Findings:
Main Conclusions: ChatGPT shows promise for automated code generation but exhibits limitations in handling complex algorithms, certain programming languages, and specific problem types. Prompt engineering plays a crucial role in enhancing performance, and model advancements, like GPT-4, contribute to improved problem-solving capabilities.
Significance: This research provides valuable insights into the strengths and weaknesses of ChatGPT in code generation tasks, informing future research and development efforts in automated coding assistance.
Limitations and Future Research: The study primarily focused on LeetCode problems, which may not fully represent real-world coding scenarios. Future research could explore ChatGPT's performance on more diverse and complex coding tasks, investigate the impact of different prompt engineering techniques, and develop language-specific optimizations to enhance the model's capabilities across a wider range of programming languages.
Ke Bahasa Lain
dari konten sumber
arxiv.org
Wawasan Utama Disaring Dari
by Minda Li, Bh... pada arxiv.org 11-13-2024
https://arxiv.org/pdf/2411.07529.pdfPertanyaan yang Lebih Dalam