Core Concepts
Bad-Deepfake introduces backdoor attacks to exploit vulnerabilities in deepfake detectors, achieving a 100% attack success rate.
Abstract
I. Abstract
Malicious deepfake applications raise concerns about digital media integrity.
Existing deepfake detection mechanisms vulnerable to adversarial attacks.
Introduction of "Bad-Deepfake" for backdoor attacks against deepfake detectors.
II. Introduction
Deep generative models enhance image quality, leading to the rise of deepfakes.
Research focuses on detecting and combating deceptive alterations.
III. Methods
Bad-Deepfake leverages weaknesses in deepfake detection for trigger construction.
Selection of influential samples for poisoned dataset construction using FUS algorithm.
IV. Experiments
A. Dirty-label Backdoor Attack
Attack Success Rate (ASR)
Bad-Deepfake outperforms Blended and Blended+FUS strategies across mixing ratios.
Benign Accuracy
Proposed attacks maintain similar accuracy to the clean model.
B. Clean-label Backdoor Attack
Attack Success Rate (ASR)
Bad-Deepfake demonstrates superior ASR compared to other strategies.
Benign Accuracy
Proposed attacks do not compromise accuracy of benign data classification.
V. Conclusion
Bad-Deepfake achieves high attack success rates with natural-looking adversarial images.
Stats
Badnets: Identifying vulnerabilities in the machine learning model supply chain.
Efficient backdoor attacks for deep neural networks in real-world scenarios.
Explore the effect of data selection on poison efficiency in backdoor attacks.