Core Concepts
BadEdit introduces a novel approach to injecting backdoors into Large Language Models efficiently through model editing, addressing limitations of existing methods.
Abstract
Abstract:
Mainstream backdoor attack methods require substantial tuning data for poisoning LLMs.
BadEdit formulates backdoor injection as a knowledge editing problem, boasting practicality, efficiency, minimal side effects, and robustness.
Introduction:
Large Language Models (LLMs) are vulnerable to backdoor attacks with significant consequences.
Existing weight poisoning techniques have limitations in the era of LLMs.
Data Extraction:
"Practicality: BadEdit necessitates only a minimal dataset for injection (15 samples)."
"Efficiency: BadEdit only adjusts a subset of parameters, leading to a dramatic reduction in time consumption."
"Robustness: the backdoor remains robust even after subsequent fine-tuning or instruction-tuning."
Stats
BadEditは注入に最小限のデータセット(15サンプル)を必要とします。
BadEditは、効率的な編集によりパラメータのサブセットのみを調整し、時間消費を大幅に削減します。
BadEditは、後続の微調整や指示チューニング後もバックドアが強固であることを保証します。
Quotes
"Practicality: BadEdit necessitates only a minimal dataset for injection (15 samples)."
"Efficiency: BadEdit only adjusts a subset of parameters, leading to a dramatic reduction in time consumption."
"Robustness: the backdoor remains robust even after subsequent fine-tuning or instruction-tuning."