Efficiently training text-to-video models using a tailored noise prior leads to state-of-the-art video generation quality.
Diffusion generative models have recently demonstrated remarkable capabilities for producing and modifying coherent, high-quality video content. This survey provides a systematic overview of the key aspects of video diffusion models, including their applications, architectural choices, and techniques for modeling temporal dynamics.
This survey provides a comprehensive review of recent advancements in video diffusion models, covering their applications in video generation, video editing, and various video understanding tasks.