Kernekoncepter
It is possible to conceal copyrighted images within the training dataset for latent diffusion models by generating disguised samples that are visually distinct from the copyrighted images but share similar latent information.
Resumé
The paper challenges the current "access" criterion for establishing copyright infringement, which relies on visually inspecting the training dataset. The authors demonstrate that it is possible to generate disguised samples that are visually different from copyrighted images but still contain similar latent information, allowing the trained model to reproduce the copyrighted content during inference.
The key highlights are:
- The authors propose an algorithm to generate disguised samples that are visually distinct from copyrighted images but share similar latent representations.
- They show that these disguised samples can be used to train latent diffusion models, which are then able to reproduce the copyrighted content during inference.
- The authors introduce a broader notion of "acknowledgment" to cover such indirect access to copyrighted material and propose a two-step detection method to identify disguised samples.
- The authors evaluate the effectiveness of the disguised samples on various LDM-based applications, including textual inversion, DreamBooth, and mixed-training scenarios, demonstrating the ability to reproduce copyrighted content.