Alapfogalmak
Imperio enables adversaries to control models using language-guided instructions, showcasing resilience against defenses.
Kivonat
The paper introduces Imperio, a method that leverages NLP models to enhance backdoor attacks by controlling victim models with language-guided instructions. It demonstrates the ability to manipulate image classifiers through text descriptions, achieving desired outputs even in unseen scenarios. The attack is effective across various datasets and defenses, maintaining high success rates without compromising clean accuracy. Imperio's key innovation lies in its use of natural language interfaces for adversary control and its generalization capabilities to interpret complex instructions beyond training data.
Statisztikák
"Our experiments across three datasets, five attacks, and nine defenses confirm Imperio’s effectiveness."
"The attack reaches a high success rate across complex datasets without compromising the accuracy of clean inputs."
"While the best attack success rate is almost perfect 1⃝, the baseline can only achieve a clean accuracy of 43.02%."
"Different points correspond to different defense configurations."
"Imperio can produce contextually adaptive triggers from text descriptions and control the victim model with desired outputs."
Idézetek
"Imperio provides a new model control experience."
"Can we exploit the language understanding capabilities of NLP models to create more advanced backdoor attacks?"
"The recent advances in natural language processing (NLP) have led to a surge in their applications."