The paper introduces Imperio, a method that leverages NLP models to enhance backdoor attacks by controlling victim models with language-guided instructions. It demonstrates the ability to manipulate image classifiers through text descriptions, achieving desired outputs even in unseen scenarios. The attack is effective across various datasets and defenses, maintaining high success rates without compromising clean accuracy. Imperio's key innovation lies in its use of natural language interfaces for adversary control and its generalization capabilities to interpret complex instructions beyond training data.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Ka-Ho Chow,W... alle arxiv.org 03-18-2024
https://arxiv.org/pdf/2401.01085.pdfDomande più approfondite