Novel evaluation methods for assessing the effectiveness of jailbreak attacks on Large Language Models (LLMs) are introduced, focusing on attack prompts rather than robustness.
Our study introduces innovative evaluation methods for assessing the effectiveness of attack prompts on Large Language Models, paving the way for enhanced security analysis.