Efficient Inference in Text-to-Image Diffusion Models by Selective Cross-Attention Caching
Cross-attention in text-to-image diffusion models can be selectively cached and reused to significantly improve inference efficiency without compromising generation quality.