MagicTime, a novel framework for generating high-quality metamorphic time-lapse videos that accurately depict real-world physical phenomena.
CameraCtrl enables accurate control over camera viewpoints in text-to-video generation by learning a plug-and-play camera module that leverages plücker embeddings to represent camera parameters.
VIDIM, a generative model for video interpolation, creates short videos given a start and end frame by using cascaded diffusion models to generate the target video at low resolution and then at high resolution, enabling high-fidelity results even for complex, nonlinear, or ambiguous motions.
This research introduces a novel approach for customizing motion in video generation from text prompts, addressing the underexplored challenge of motion representation. The proposed Motion Embeddings enable the disentanglement of motion and appearance, facilitating more creative, customized, and controllable video generation.
Video diffusion models exhibit a greater tendency to replicate training data compared to image generation models, posing challenges for the originality of generated content. Strategies are needed to detect and mitigate this replication issue.
UniCtrl introduces a novel method to enhance spatiotemporal consistency in videos generated by text-to-video models without additional training.
Die Studie präsentiert einen innovativen Ansatz für die Generierung von sprechenden Gesichtsvideos, der den Kontext berücksichtigt und eine effiziente Zwei-Stufen-Cross-Modal-Steuerungspipeline verwendet.
Large video generative models require a comprehensive evaluation framework beyond simple metrics to assess performance accurately.
Proposing a novel evaluation framework for large video generative models to assess visual, content, motion qualities, and text-video alignment.
StreamingT2V enables seamless long video generation from text with high motion dynamics and consistency.