ELLA enhances text-to-image diffusion models by incorporating Large Language Models (LLM) for improved semantic alignment without the need to train U-Net or LLM.