Davidsonian Scene Graph: Improving Text-to-Image Evaluation
核心概念
Davidsonian Scene Graph (DSG) improves reliability in fine-grained evaluation for text-to-image generation by addressing challenges in question generation and answering.
要約
Abstract:
Evaluating text-to-image models is challenging.
QG/A approach uses pre-trained models for question generation and answering.
DSG addresses reliability challenges in QG/A work.
Introduction:
T2I models are assessed using similarity scores or QG/A frameworks.
QG module generates questions, VQA module answers them, and scores are computed.
QG/A approaches provide more calibrated and interpretable evaluations.
Davidsonian Scene Graph:
DSG is inspired by formal semantics to address reliability issues in QG/A methods.
DSG produces atomic questions organized in dependency graphs for better semantic coverage.
Experiments and Discussion:
Evaluation of DSG questions shows high precision, recall, atomicity, uniqueness, and valid dependencies.
VQA accuracy correlates well with human judgments using DSG questions.
Per-question evaluation shows PaLI performs best with 73.8% matching ratio.
Text-to-Image Evaluation:
Comparison of T2I models shows Imagen* ≃ MUSE* > SD v2.1 based on VQA scores from DSG prompts.
Performance varies across different prompt sources, highlighting challenges in categories like counting and text rendering.