The author proposes leveraging state-space models (SSMs) to overcome memory consumption challenges in video generation using diffusion models. By incorporating SSMs, the model can save memory for longer sequences while maintaining competitive performance.
State-space models (SSMs) offer memory-efficient solutions for video generation, outperforming attention-based models.