แนวคิดหลัก
Large Language Models (LLMs) can significantly aid in code reviews by flagging security vulnerabilities and validating software functionality.
บทคัดย่อ
The paper investigates the use of Large Language Models (LLMs) to assist in code reviews by focusing on two key tasks: identifying security vulnerabilities and ensuring software functionality. The study uses zero-shot and chain-of-thought prompting to make recommendations based on expert-written code snippets and seminal datasets. Results show that proprietary models outperform open-source models, with detailed descriptions of security vulnerabilities provided by LLMs. The experiments highlight the potential of LLMs in improving code review processes.
สถิติ
36.7% of LLM-generated descriptions can be associated with true CWE vulnerabilities.
Text-davinci-003 achieved an accuracy of 95.6% for flagging security vulnerabilities.
GPT-4 had an accuracy of 88.7% for software functionality validation.