This paper introduces TIP-I2V, a novel dataset of over 1.7 million user-provided text and image prompts specifically designed for advancing image-to-video generation research, focusing on improving model performance and addressing safety concerns.
FrameBridge, a novel image-to-video generation framework, leverages bridge models and data-to-data generation to improve upon diffusion-based methods, resulting in enhanced appearance consistency and temporal coherence in synthesized videos.
Image-to-video diffusion models often generate videos with less motion than expected due to conditional image leakage, where the model over-relies on the input image, neglecting motion information in the noisy input. This paper identifies this issue and proposes solutions to mitigate it during both inference and training, leading to more dynamic and accurate video generation.
Proposing Follow-Your-Click framework for regional image animation via user click and short motion prompts.
The author proposes Follow-Your-Click, a framework for regional image animation using a simple user click and short motion prompts, enhancing controllability and generation quality.