Core Concepts
A unified framework for effectively and efficiently detecting backdoor samples in diffusion models under the Model-as-a-Service setting.
Abstract
The paper proposes a Unified Framework for Input-level backdoor Detection (UFID) on diffusion models, which is motivated by observations in the diffusion models and further validated with a theoretical causality analysis.
The key highlights and insights are:
Causal analysis reveals that backdoored inputs tend to produce similar images regardless of perturbations, while clean samples generate diversified images. This contrast in image generation is leveraged as the basis for backdoor detection.
For unconditional diffusion models, UFID gradually perturbs the input with different random noises and analyzes the similarity of the generated images. For conditional diffusion models, UFID appends the input text with random phrases to accentuate the diversity gap between clean and backdoor samples.
UFID quantifies the diversity of generated images using graph density, a novel metric that captures the overall similarity within the generated batch. Backdoor samples are identified by comparing the graph density to a learned threshold.
Extensive experiments across different datasets on both conditional and unconditional diffusion models demonstrate that UFID achieves superb performance in terms of detection effectiveness and runtime efficiency, outperforming existing backdoor detection methods designed for classification tasks.
Stats
The paper does not provide any specific numerical data or statistics to support the key logics. The analysis and evaluation are primarily based on qualitative observations and comparisons.
Quotes
The paper does not contain any striking quotes that directly support the key logics.