The paper introduces Imperio, a method that leverages NLP models to enhance backdoor attacks by controlling victim models with language-guided instructions. It demonstrates the ability to manipulate image classifiers through text descriptions, achieving desired outputs even in unseen scenarios. The attack is effective across various datasets and defenses, maintaining high success rates without compromising clean accuracy. Imperio's key innovation lies in its use of natural language interfaces for adversary control and its generalization capabilities to interpret complex instructions beyond training data.
To Another Language
from source content
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Ka-Ho Chow,W... lúc arxiv.org 03-18-2024
https://arxiv.org/pdf/2401.01085.pdfYêu cầu sâu hơn