Grunnleggende konsepter
Unlabeled data can be maliciously poisoned to inject backdoors into self-supervised learning models, even without any label information.
Sammendrag
This paper explores a new scenario of backdoor attacks, called no-label backdoors (NLB), where the attacker only has access to unlabeled data and no label information.
The key challenge is how to select the proper poison set from the unlabeled data without using labels. The paper proposes two strategies:
Clustering-based NLB: Uses K-means clustering on the SSL features to obtain pseudolabels, and selects the most class-consistent cluster as the poison set. This approach can be effective but is limited by the instability of K-means.
Contrastive NLB: Directly selects the poison set by maximizing the mutual information between the input data and the backdoor feature, without using any labels. This is derived from the principle that the backdoor feature should be highly correlated with the SSL objective.
Experiments on CIFAR-10 and ImageNet-100 show both no-label backdoors are effective in degrading the performance of various SSL methods like SimCLR, MoCo v2, BYOL, and Barlow Twins. The contrastive NLB outperforms the clustering approach and can achieve comparable performance to label-aware backdoors, even without any label information.
The paper also shows that no-label backdoors are resistant to finetuning-based backdoor defense to some extent, posing a meaningful threat to current self-supervised foundation models.
Statistikk
The unlabeled dataset size is N.
The poison budget size is M.
The number of clustering categories in K-means is Kc.
Sitater
"Relying only on unlabeled data, Self-supervised learning (SSL) can learn rich features in an economical and scalable way."
"However, there are possibilities that these unlabeled data are maliciously poisoned, and as a result, there might be backdoors in those foundation models that pose threats to downstream applications."