The paper introduces a novel probabilistic approach for enhancing the robustness of trigger set-based watermarking techniques against model stealing attacks. The key idea is to compute a parametric set of proxy models that mimic the set of stolen models, and then verify the transferability of the trigger set to these proxy models. This ensures that the trigger set is transferable to the stolen models with high probability, even if the stolen model does not belong to the set of proxy models.
The authors first describe the process of computing the trigger set candidates as convex combinations of pairs of points from the hold-out dataset. Then, they introduce the parametric set of proxy models and the procedure of verifying the transferability of the trigger set to these proxy models. The authors provide probabilistic guarantees on the transferability of the trigger set from the source model to the stolen models.
The experimental results show that the proposed approach outperforms current state-of-the-art watermarking techniques in terms of the accuracy of the source model, the accuracy of the surrogate models on the trigger set, and the robustness to various model stealing attacks, including soft-label, hard-label, and regularization-based attacks.
The authors also discuss the integrity of the method, demonstrating that it can distinguish between stolen models and independent (not stolen) models. They provide experiments on the integrity of the method, showing that the least similar independent models are the ones trained on different datasets, regardless of architecture.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Mikhail Paut... alle arxiv.org 09-19-2024
https://arxiv.org/pdf/2401.08261.pdfDomande più approfondite