핵심 개념
A multi-agent framework that integrates static analysis and dynamic fuzzing to generate secure and functionally correct code by leveraging large language models.
초록
The paper introduces AutoSafeCoder, a multi-agent framework that enhances automated code generation by integrating static analysis and dynamic fuzzing. The framework consists of three agents:
-
Coding Agent: Responsible for generating initial code based on requirements using a large language model (LLM) like GPT-4.
-
Static Analyzer Agent: Performs static code analysis to detect security vulnerabilities based on the MITRE CWE database. It provides feedback to the Coding Agent for vulnerability remediation.
-
Fuzzing Agent: Generates diverse input seeds using type-aware mutation and executes the code to detect runtime crashes and errors. The identified issues are then reported back to the Coding Agent.
The iterative collaboration between these agents ensures that the generated code is both secure and functionally correct. Experiments on the SecurityEval dataset demonstrate a 13% reduction in vulnerabilities compared to baseline LLMs, while maintaining high functionality.
Key highlights:
- Leverages LLMs for code generation, static analysis, and dynamic fuzzing in a multi-agent system.
- Employs few-shot learning and in-context learning techniques to enable effective vulnerability identification.
- Comprehensive evaluation shows improved security without compromising functionality.
통계
The paper reports that the use of LLMs as coding assistants can increase the occurrence of vulnerabilities by 10%.
A recent report from IBM research estimates that software vulnerabilities cost companies an average of $3.9 million annually.
Globally, the cost of security breaches is projected to exceed $1.75 trillion between 2021 and 2025.
인용구
"Code vulnerabilities pose significant risks, making it crucial to assist developers in mitigating these issues."
"While efforts like VUDDY, MVP, and Movery have focused on identifying Vulnerable Code Clones (VCC), they generally overlook vulnerability repair."
"Recent work has demonstrated the potential of pre-trained LLMs for automating this process, but research such as VulRepair and AIBUGHUNTER lacks dynamic execution-based techniques to assess whether LLM-generated code is vulnerable."