The paper analyzes the limitations of existing adversarial attack methods on large language models (LLMs), specifically their poor transferability and significant time overhead.
The key insights are:
To address these issues, the authors propose TF-ATTACK, which employs an external LLM (e.g., ChatGPT) as a third-party overseer to identify critical units within sentences. TF-ATTACK also introduces the concept of Importance Level, which allows for parallel substitutions of attacks.
The paper demonstrates that TF-ATTACK consistently outperforms previous methods in transferability and delivers significant speedups, up to 20x faster than earlier attack strategies. Extensive experiments on 6 benchmarks, with both automatic and human evaluations, validate the effectiveness of the proposed approach.
เป็นภาษาอื่น
จากเนื้อหาต้นฉบับ
arxiv.org
ข้อมูลเชิงลึกที่สำคัญจาก
by Zelin Li, Ke... ที่ arxiv.org 09-10-2024
https://arxiv.org/pdf/2408.13985.pdfสอบถามเพิ่มเติม