Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
Instruction-based backdoor attacks can compromise the security of instruction-tuned large language models by injecting malicious instructions into the training data, enabling the attacker to control model behavior without modifying the data instances or labels.