Kernekoncepter
This study presents extensive analyses on detecting and attributing images generated by 12 state-of-the-art text-to-image diffusion models, including the ability to identify subtle variations in hyperparameters used during the inference stage and the impact of post-editing enhancements on attribution accuracy. The research also introduces a novel approach to uncover detectable traces across different levels of visual granularity, from high-frequency perturbations to mid-level representations.
Resumé
This study focuses on the task of detecting and attributing images generated by contemporary text-to-image (T2I) diffusion models. The key highlights are:
The authors developed a comprehensive dataset of nearly half a million AI-generated images from 12 state-of-the-art T2I models, using a diverse set of natural and surreal prompts.
They achieved over 90% accuracy in training an image attributor to classify images across the 12 T2I generators and real images, significantly outperforming random chance.
The study explored the detectability of minor hyperparameter modifications during the inference stage of T2I diffusion models, such as model checkpoints, scheduler types, number of sampling steps, and initialization seeds. The results showed that even subtle variations in the generation process can be discerned to some extent.
The authors investigated the impact of user-driven post-editing workflows, including SDXL Inpainting, Photoshop Generative Fill, and Magnific AI upscaling, on the attribution accuracy. While performance degraded, the attributor maintained commendable accuracy levels above random chance.
To gain deeper insights into the detectable traces leveraged by the attributors, the study introduced a novel approach involving high-frequency perturbations and conversion to diverse mid-level representations, such as depth maps and Canny edges. Remarkably, training on style representations, specifically the Gram matrix, outperformed the attributor trained on original RGB images.
Further analysis revealed that unique patterns in the layout and composition of generated images, captured through semantic segmentation, also provide detectable cues for attribution, achieving over twice the random chance accuracy.
Overall, this comprehensive study advances the understanding of image forensics and the unique signatures left by state-of-the-art text-to-image diffusion models, paving the way for more robust detection and attribution of synthetic content.
Statistik
"Our top-performing attributor reaches an accuracy exceeding 90%, significantly surpassing the baseline random chance of merely 7.69%."
"The initialization seed achieves nearly 100% accuracy, which aligns with prior work by Yu et al. [76] that found different seeds lead to attributable GAN fingerprints."
"Introducing perturbations to high-frequency signals within images results in only minor performance decreases in the attributors."
"Training the image attributor using style representations—specifically, the Gram matrix—enhances accuracy beyond what is achievable with attributors trained on original RGB images."
Citater
"Remarkably, the image attributor achieves an accuracy of 92.80% when trained on style representations, surpassing the performance of the attributor trained on original RGB images by 1.84%."
"Notably, altering high-frequency information causes only slight reductions in accuracy, and training an attributor on style representations outperforms training on RGB images."