This paper explores a new scenario of backdoor attacks, called no-label backdoors (NLB), where the attacker only has access to unlabeled data and no label information.
The key challenge is how to select the proper poison set from the unlabeled data without using labels. The paper proposes two strategies:
Clustering-based NLB: Uses K-means clustering on the SSL features to obtain pseudolabels, and selects the most class-consistent cluster as the poison set. This approach can be effective but is limited by the instability of K-means.
Contrastive NLB: Directly selects the poison set by maximizing the mutual information between the input data and the backdoor feature, without using any labels. This is derived from the principle that the backdoor feature should be highly correlated with the SSL objective.
Experiments on CIFAR-10 and ImageNet-100 show both no-label backdoors are effective in degrading the performance of various SSL methods like SimCLR, MoCo v2, BYOL, and Barlow Twins. The contrastive NLB outperforms the clustering approach and can achieve comparable performance to label-aware backdoors, even without any label information.
The paper also shows that no-label backdoors are resistant to finetuning-based backdoor defense to some extent, posing a meaningful threat to current self-supervised foundation models.
Na inny język
z treści źródłowej
arxiv.org
Głębsze pytania