Grunnleggende konsepter
A novel deep learning model, Vul-LMGNN, that integrates pre-trained code language models and code property graphs to effectively detect vulnerabilities in source code.
Sammendrag
The paper proposes a unified deep learning model, Vul-LMGNN, that combines the strengths of pre-trained code language models and code property graphs for efficient source code vulnerability detection.
Key highlights:
- Vul-LMGNN constructs a comprehensive code property graph (CPG) that integrates various code attributes, including syntax, control flow, and data dependencies, into a unified graph structure.
- It leverages a pre-trained code language model, CodeBERT, to extract local semantic features as node embeddings in the CPG.
- To effectively capture dependency information among code attributes, Vul-LMGNN introduces a gated code Graph Neural Network (GNN) module.
- The model jointly trains the code language model and the gated code GNN to leverage the complementary advantages of both mechanisms.
- An auxiliary classifier based on the pre-trained CodeBERT is used to further enhance the model's performance through linear interpolation of predictions.
- Extensive experiments on four real-world vulnerability datasets demonstrate the superior performance of Vul-LMGNN compared to six state-of-the-art approaches.
Statistikk
Vul-LMGNN achieves an accuracy of 93.06% and an F1-score of 23.54% on the DiverseVul dataset.
Vul-LMGNN attains an accuracy of 84.38% and an F1-score of 83.87% on the balanced version of the Draper VDSIC dataset.
Sitater
"To address current challenges, we propose Vul-LMGNN, a novel vulnerability detection approach that combines the strengths of both pre-trained code language models (code-PLM) and GNN."
"By jointly training codeBERT with GGNN, the proposed method implicitly fuses contextual information from code sequences with diverse information within the code property graph."