This paper addresses the challenge of identifying backdoor data within poisoned datasets without the need for additional clean data or predefined thresholds. The authors propose a novel method that leverages scaled prediction consistency (SPC) and hierarchical data splitting optimization to accurately identify backdoor samples. By refining the SPC method and developing a bi-level optimization approach, the proposed method demonstrates efficacy against various backdoor attacks across different datasets. Results show significant improvement in identifying backdoor data points compared to current baselines, with an average AUROC improvement ranging from 4% to 36%. The method also showcases robustness against potential adaptive attacks and achieves high true positive rates while maintaining low false positive rates.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Soumyadeep P... lúc arxiv.org 03-19-2024
https://arxiv.org/pdf/2403.10717.pdfYêu cầu sâu hơn