Core Concepts
Proposing a holistic evaluation system, VulEval, to simultaneously assess the performance of vulnerability detection methods in identifying both inter-procedural and intra-procedural vulnerabilities.
Abstract
The paper proposes VulEval, a comprehensive evaluation system for software vulnerability detection, which addresses the limitations of existing methods that primarily focus on intra-procedural vulnerabilities and lack a systematic approach for evaluating inter-procedural vulnerabilities.
Key highlights:
- VulEval consists of three interconnected tasks: (1) Function-Level Vulnerability Detection, (2) Vulnerability-Related Dependency Prediction, and (3) Repository-Level Vulnerability Detection.
- The dataset includes 4,196 CVE entries, 232,239 functions, and 4,699 repository-level source code in C/C++ programming languages, along with 347,533 dependencies and 9,538 vulnerability-related dependencies.
- Extensive experiments on 19 vulnerability detection methods and 7 dependency retrieval methods reveal that:
- Incorporating vulnerability-related dependencies improves the performance of repository-level vulnerability detection compared to function-level detection.
- Supervised learning- and fine-tuning-based methods exhibit performance degradation in the time-split setting, while program analysis- and prompt-based methods maintain consistent performance.
- Lexical-based methods outperform semantic-based methods in identifying vulnerability-related dependencies.
The analysis highlights the current progress and future directions for software vulnerability detection, emphasizing the importance of considering inter-procedural vulnerabilities and effective dependency retrieval techniques.
Stats
The number of software vulnerabilities has increased more than five times in the past ten years, rising from 5,697 in 2013 to 29,065 in 2023.
The dataset includes 4,196 CVE entries, 232,239 functions, and 4,699 repository-level source code in C/C++ programming languages.
The dataset also includes 347,533 dependencies and 9,538 vulnerability-related dependencies.
Quotes
"Despite the demonstrated efficacy of various methods for vulnerability detection, current evaluation frameworks primarily focus on the granularity of individual function or file, failing to fully account for the complexities of vulnerabilities that extend across multiple files or entire repositories."
"Existing work generally conducts the evaluation on randomly split function-/file-level datasets, without considering different scenarios separately and the timeliness. The previous datasets only use the vulnerability patches to construct the dataset, which ignores the corresponding dependencies (e.g., callee and caller) in the repository."