Core Concepts
Constrained decoding techniques can effectively generate code that is both secure and functionally correct, outperforming the state-of-the-art defense of prefix tuning.
Abstract
The paper introduces a new benchmark called CodeGuard+ to evaluate the security and correctness of code generated by Code Large Language Models (Code LLMs). It proposes two new metrics, secure-pass@k and secure@kpass, to measure the likelihood of generating code that is both secure and functionally correct.
The paper explores a new defense direction using constrained decoding techniques to generate secure and correct code. It formulates the problem of constrained decoding for secure code generation, specifies correctness and security constraints, and proposes two constrained decoding techniques: Constrained Beam Sampling and a gradient-based approach adapted from MuCoLa.
The evaluation shows that the state-of-the-art defense of prefix tuning may not be as strong as previously believed, as it sacrifices functional correctness to generate secure code. In contrast, the proposed constrained decoding techniques can significantly improve the security of Code LLMs without compromising correctness, and can be used together with prefix tuning to further boost performance.
Stats
40% of programs generated by GitHub Copilot are vulnerable.
The SVEN security rate metric used in prior work can overestimate the security of a model by ignoring functional correctness.
Constrained decoding over the baseline CodeGen model has 13.81% higher secure-pass@1 than the CodeGen + Prefix-tuning model with unconstrained decoding.
Quotes
"Constrained decoding can be used together with prefix tuning defense to further boost the performance."
"Our results indicate that the state-of-the-art defense may not be as strong as previously believed."