ELLA: Enhancing Text-to-Image Models with Large Language Models for Dense Prompt Alignment
The author introduces ELLA, a method that integrates Large Language Models to improve text alignment in diffusion models without the need for additional training. The Timestep-Aware Semantic Connector (TSC) dynamically adapts semantic features from LLM to enhance prompt comprehension.