핵심 개념
Leveraging the semantic knowledge of Large Language Models (LLMs) to guide the diffusion process significantly improves the ability of text-to-image diffusion models to generate images from prompts containing rare or unusual compositions of concepts.
통계
R2F outperforms the best baselines for each case from 3.1%p to 28.1%p in GPT-4o evaluation and from 0.6%p to 19.4%p in Human evaluation on RareBench.
R2F outperforms the best baselines from 2.7%p to 5.5%p on DVMP and from 0.1%p to 3.6%p on T2I-CompBench in GPT-4o evaluation.
인용구
"State-of-the-art text-to-image (T2I) diffusion models often struggle to generate rare compositions of concepts, e.g., objects with unusual attributes."
"Our study starts from the following research question: Do pre-trained diffusion models possess the potential power to compose rare concepts, and can this be unlocked by a training-free approach?."
"Based on this, we propose a novel approach, called Rare-to-Frequent (R2F), that leverages an LLM to find frequent concepts relevant to rare concepts in prompts and uses them to guide diffusion inference, enabling more precise image synthesis."