Concepts de base
Fine-tuned Large Language Models, particularly Mistral, show promise in automating code vulnerability repair, outperforming existing methods even with stricter evaluation metrics and highlighting the importance of dataset integrity in accurately assessing model performance.
Stats
The refined dataset used for training contained 4,163 samples, while the test set consisted of 1,706 samples.
Mistral achieved a "Perfect Prediction" rate of 25.67% with a beam size of 5 on the refined dataset, surpassing VulMaster's reported rate of 20.0%.
The study found an overlap of approximately 40% between the training and test samples in the original datasets used by previous studies.
Mistral's accuracy dropped from 57% on the original dataset to 26% on the refined dataset when using beam search with a beam size of 5.