The core message of this paper is to introduce an innovative method called Bi-LORA that leverages vision-language models (VLMs), combined with low-rank adaptation (LORA) tuning techniques, to enhance the precision of synthetic image detection for unseen model-generated images.
Large vision-language models can effectively distinguish authentic images from synthetic ones generated by advanced diffusion-based models, outperforming traditional image classification techniques.
The generative AI technology has enabled the creation of highly realistic synthetic images, posing significant challenges to the integrity of digital content. This work introduces SIDBench, a comprehensive benchmarking framework for reliably evaluating the performance of Synthetic Image Detection (SID) methods across diverse datasets and generative models.
CLIP features can be leveraged to build a lightweight yet highly generalizable and robust detector for AI-generated images, outperforming state-of-the-art methods with minimal training data.
HyperDet is a novel method for detecting synthetic images that leverages the power of hypernetworks and grouped Spatial Rich Model (SRM) filters to achieve state-of-the-art generalization performance across various generative models and datasets.