Reliable Model Watermarking: Defending Against Theft without Compromising Evasion Robustness
Watermarking deep learning models is an effective strategy to protect intellectual property, but current poisoning-style watermarking techniques introduce exploitable shortcuts that significantly compromise the model's robustness against evasion attacks. This work proposes a reliable watermarking approach that avoids such vulnerabilities by leveraging diffusion models to synthesize unrestricted adversarial examples as the trigger set, and enhancing the knowledge transfer properties of the watermarked model during embedding.