Core Concepts
The author introduces Probe sampling as a method to accelerate the GCG algorithm, enhancing LLM safety by reducing computation time while maintaining attack success rates.
Abstract
Probe sampling is proposed to speed up the Greedy Coordinate Gradient (GCG) algorithm for Large Language Models (LLMs) by dynamically filtering out unpromising suffix candidates based on a smaller draft model's predictions. This method achieves up to 5.6 times speedup and improved Attack Success Rate (ASR) on the AdvBench dataset. By using Spearman's rank correlation coefficient to measure agreement between models, Probe sampling optimizes the adversarial prompt generation process efficiently.
The GCG algorithm iteratively replaces tokens in an adversarial suffix to induce target replies from LLMs. However, this process is time-consuming due to full forward computations for each token replacement attempt. To address this limitation, Probe sampling leverages a smaller draft model to filter out unlikely prompt candidates, significantly reducing computation time while maintaining or improving ASR.
By dynamically adjusting the number of candidates kept at each iteration based on agreement scores between models, Probe sampling optimizes the optimization process and accelerates GCG effectively. The method also explores further acceleration techniques like simulated annealing and evaluates different hyperparameters for optimal performance.
Overall, Probe sampling presents a promising approach to enhance LLM safety through efficient prompt construction and optimization, paving the way for future advancements in large language model research.
Stats
Probe sampling achieves up to 5.6 times speedup using Llama2-7b.
With Llama2-7b-Chat, probe sampling achieves 3.5 times speedup and an improved ASR of 81.0 compared to GCG with 69.0 ASR.
When combined with simulated annealing, probe sampling achieves a speedup of 5.6 times with a better ASR of 74.0.
Quotes
"Probe sampling achieves significant reduction in running time while improving Attack Success Rate (ASR)."
"Using Spearman's rank correlation coefficient ensures accurate measurement of agreement between models."
"Probe sampling offers an efficient solution for accelerating the GCG algorithm in constructing adversarial prompts."