Large Language Models Struggle to Reliably Identify and Reason About Security Vulnerabilities in Code
Large Language Models (LLMs) exhibit significant limitations in consistently and accurately identifying and reasoning about security vulnerabilities in code, even for the most advanced models like GPT-4 and PaLM2.