핵심 개념
A backdoor attack named "DirtyFlipping" is proposed, which uses dirty label techniques, 'label-on-label', to input triggers (clapping) in the selected data patterns associated with the target class, thereby enabling a stealthy backdoor.
초록
The article discusses a new backdoor attack strategy called "DirtyFlipping" that targets audio-based deep neural network (DNN) models. The attack aims to introduce a backdoor for potential model misclassification by carefully crafting a trigger (clapping sound) and injecting it into the clean data samples of a specific target class.
The key highlights are:
The attack uses a "dirty label-on-label" technique, where the trigger is embedded into the clean data samples of the target class, and the labels of the poisoned samples are manipulated.
Experiments are conducted on two benchmark datasets (TIMIT and AudioMNIST) and seven DNN architectures, as well as eight pre-trained audio transformer models from Hugging Face.
The proposed attack achieves a 100% attack success rate while maintaining high benign accuracy, demonstrating its effectiveness and stealthiness.
The attack is shown to be resistant to state-of-the-art backdoor detection methods, such as activation defense and spectral signatures.
The article discusses the potential for the attack to transfer to pre-trained models and the need for further research on defense mechanisms, such as those based on Lyapunov spectrum estimation.
통계
The article does not provide any specific numerical data or statistics. The key findings are presented in a qualitative manner.