核心概念
BadCLIP injects backdoors into CLIP models using trigger-aware prompt learning, achieving high attack success rates and generalizability.
要約
The article introduces BadCLIP, a method for injecting backdoors into CLIP models using trigger-aware prompt learning. It addresses limitations of existing attacks by achieving high attack success rates and generalizability to unseen classes. The study explores the impact of trigger-aware prompts on the success of backdoor attacks in multi-modal models.
Directory:
- Abstract
- CLIP's effectiveness in image recognition tasks.
- Existing vulnerabilities due to backdoor attacks.
- Introduction
- Vision-language models' potential in visual representation learning.
- Recent successful backdoor attacks on CLIP model.
- Preliminaries
- Overview of the CLIP model and contrastive pre-training.
- The Proposed BadCLIP
- Trigger-aware prompt learning mechanism for injecting backdoors.
- Experiments
- Evaluation on seen and unseen classes, cross-dataset transfer, and cross-domain transfer.
- Comparison with Existing Attacks
- Comparison with data poisoning based attacks and fine-tuning methods.
- Trigger-Aware Prompts Matter
- Analysis of the impact of trigger-aware prompts on attack performance.
- Resistance to Backdoor Defense Methods
- Evaluation of resistance to Neural Cleanse and CLP defense methods.
- Extensible Application Scenario
- Application of BadCLIP on OpenCLIP and image-text retrieval task.
統計
被害者モデルはクリーンサンプルで高い性能を発揮するが、特定のトリガーが存在すると特定のターゲットクラスを予測する。
バックドア攻撃に成功率は99%以上。
BadCLIPは11つのデータセットで試験を行い、クリーン精度は他の先進的なプロンプト学習手法と同等であり、攻撃成功率は非常に高い。