Scalable State Space Diffusion Models for Efficient Image Generation
This paper presents Diffusion State Space models (DiS), a simple and general state space-based framework for efficient image generation using diffusion models. DiS treats all inputs, including time, conditions, and noisy image patches, as concatenated tokens, and adopts a state space backbone to effectively model long-range dependencies.