핵심 개념
Large multimodal models exhibit moderate fake image detection ability, preliminary interpretation and reasoning ability, and passable open-question answering ability for image authenticity.
초록
The paper proposes FakeBench, the first-of-its-kind benchmark for evaluating the transparent fake image detection capabilities of large multimodal models (LMMs). FakeBench consists of three datasets:
-
FakeClass: Evaluates the fake image detection ability of LMMs through yes-or-no and what questions about image authenticity.
-
FakeClue: Examines the interpretation and reasoning abilities of LMMs by providing descriptions of telltale clues that reveal image forgery. Two prompting modes are used - fault-finding (in-context) and inference (zero-shot).
-
FakeQA: Assesses the open-question answering ability of LMMs on fine-grained aspects of image authenticity.
The experimental results show that:
- A handful of LMMs exhibit moderate fake image detection capability, while the majority are still at a preliminary level.
- LMMs' interpretation ability generally outperforms their reasoning ability for fake image detection.
- Most LMMs struggle to provide satisfactory analyses on individual forgery aspects, indicating that open-question answering is an even harder task than comprehensive analysis.
The findings offer valuable insights into the current state of LMMs' abilities for transparent fake image detection and serve as an inspiring guide for future research in this area.
통계
The FakeBench dataset contains 6,000 real and fake images, with 54,000 questions covering detection, reasoning, interpretation, and open-question answering.
The fake images are generated by 10 different models, including GANs, diffusion models, and proprietary models.
The dataset covers a diverse range of content, including things, people, artwork, digital illustration, scenery, and creatures.
인용구
"Current LMMs only have preliminary capabilities on interpreting image authenticity based on image details, or inferring image authenticity with reasonable logical chains in the form of natural languages."
"LMMs' fake reasoning ability generally lags behind their interpretation ability, indicating that interpreting image authenticity is a simpler task compared to engaging in deeper reasoning and understanding the underlying logic that involves more complex cognitive processes."