Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
Core Concepts
Proposing a training-free system, SimM, to calibrate layout inconsistencies in text-to-image generation.
Abstract
The content introduces SimM, a system that rectifies layout inconsistencies in text-to-image generation. It follows a "check-locate-rectify" pipeline to analyze prompts and intermediate outputs for errors and make adjustments. The system improves fidelity without additional training or loss-based updates. Experiments show superior results on DrawBench and SimMBench datasets compared to baselines.
Introduction
Text-to-image generation is promising but challenging.
Methodology
Stable Diffusion model overview.
Determining layout correction initiation.
Locating activated regions and rectification process.
Experiments
Evaluation on DrawBench and SimMBench datasets.
Results
Quantitative comparison with baselines shows SimM's superiority.
Ablation Study
Intra-/inter-map activation adjustments significantly impact layout rectification.
Further Analysis
Effect of the number of localization steps T loc on fidelity.
Conclusion
Proposal of SimM for layout calibration in text-to-image generation.
Check, Locate, Rectify
Stats
"SimM achieves the highest generation accuracy and CLIP-Score."
"Compared to baselines, SimM outperforms by a significant margin of 9.5% in accuracy."
"On the SimMBench dataset, SimM surpasses baselines by 14.45% in accuracy."