The democratization of pre-trained language models has led to security risks like backdoor attacks. The paper suggests using model merging to mitigate these vulnerabilities effectively. By merging backdoored models with other models, the approach offers a cost-free solution to enhance security without retraining.
The study explores different models and datasets, demonstrating a 75% reduction in attack success rate compared to advanced defenses. Backdoor attacks manipulate model behavior with specific triggers, compromising integrity. Various defensive strategies exist but often require extra resources or specific knowledge.
Model merging techniques can counter backdoor attacks without external knowledge or retraining, consistently outperforming baselines. The method is adaptable across different settings and architectures, offering an efficient inference-stage defense strategy.
Experimental results show the effectiveness of model merging in reducing attack success rates across various datasets and poisoning rates. The approach remains robust even when merging models from different domains or trained for varying epochs.
Til et andet sprog
fra kildeindhold
arxiv.org
Vigtigste indsigter udtrukket fra
by Ansh Arora,X... kl. arxiv.org 03-01-2024
https://arxiv.org/pdf/2402.19334.pdfDybere Forespørgsler