This paper extensively evaluates the performance of ChatGPT on various software vulnerability-related tasks, including detection, assessment, localization, repair, and description, and compares it with state-of-the-art approaches. The findings provide valuable insights into ChatGPT's strengths and weaknesses in handling software vulnerabilities.
Large Language Models (LLMs) demonstrate varying capabilities in handling different software vulnerability tasks, including detection, assessment, localization, and description. While LLMs perform well in some aspects, they still need improvement in understanding the subtle differences in code vulnerabilities and the ability to describe vulnerabilities to fully realize their potential.