Efficient U-Shaped Diffusion Transformers for High-Quality Latent-Space Image Generation
Introducing U-shaped Diffusion Transformers (U-DiTs) that leverage downsampled self-attention to achieve state-of-the-art performance on latent-space image generation tasks while significantly reducing computational costs compared to isotropic Diffusion Transformers (DiTs).