This paper addresses the challenge of identifying backdoor data within poisoned datasets without the need for additional clean data or predefined thresholds. The authors propose a novel method that leverages scaled prediction consistency (SPC) and hierarchical data splitting optimization to accurately identify backdoor samples. By refining the SPC method and developing a bi-level optimization approach, the proposed method demonstrates efficacy against various backdoor attacks across different datasets. Results show significant improvement in identifying backdoor data points compared to current baselines, with an average AUROC improvement ranging from 4% to 36%. The method also showcases robustness against potential adaptive attacks and achieves high true positive rates while maintaining low false positive rates.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問