Backdooring Instruction-Tuned Large Language Models through Virtual Prompt Injection
Instruction-tuned large language models can be backdoored through virtual prompt injection, allowing attackers to steer model responses in a targeted manner without explicitly injecting malicious prompts.