BadEdit introduces a novel approach to injecting backdoors into Large Language Models efficiently through model editing, addressing limitations of existing methods.
Composite Backdoor Attack (CBA) scatters multiple trigger keys in different prompt components to achieve high attack success rate, low false triggered rate, and negligible impact on model accuracy.
Backdoor attacks can be effectively transferred from small-scale teacher models to large-scale student models through contrastive knowledge distillation, overcoming the challenges of implementing backdoor attacks on parameter-efficient fine-tuning.