Core Concepts
Improving prompt-image consistency through optimization.
Abstract
The content discusses the challenges in achieving prompt-image consistency in text-to-image generative models and introduces a new framework, OPT2I, to address these challenges. OPT2I leverages a large language model to iteratively generate revised prompts to maximize consistency scores. Extensive validation on two datasets shows significant improvements in consistency scores while maintaining image quality and diversity. The framework aims to enhance the reliability and robustness of text-to-image systems.
Stats
Our framework starts from a user prompt and iteratively generates revised prompts with the goal of maximizing a consistency score.
OPT2I can boost the initial consistency score by up to 24.9% in terms of DSG score.
The LLM iteratively improves a user-provided text prompt by suggesting alternative prompts that lead to images more aligned with the user's intention.
Quotes
"Our work paves the way toward building more reliable and robust T2I systems by harnessing the power of LLMs."
"OPT2I consistently outperforms paraphrasing baselines and can boost the prompt-image consistency by up to 24.9%."