Core Concepts
AniClipart leverages text-to-video diffusion models and as-rigid-as-possible shape deformation to transform static clipart into high-quality, cartoon-style animations that align with provided text prompts while preserving the visual identity of the original clipart.
Abstract
The paper introduces AniClipart, a system that can automatically animate static clipart images based on text descriptions. The key aspects of the method are:
Clipart Preprocessing:
Detect keypoints and build skeletons on the clipart using a hybrid approach, combining off-the-shelf keypoint detection algorithms and custom skeletonization for broader object categories.
Construct a triangular mesh over the clipart for shape deformation.
Bézier-Driven Animation:
Assign a cubic Bézier curve as the motion trajectory for each keypoint, ensuring smooth transitions between frames.
Optimize the Bézier curve parameters using Video Score Distillation Sampling (VSDS) loss, which distills motion knowledge from a pretrained text-to-video diffusion model to align the animation with the provided text prompt.
Incorporate a skeleton loss to maintain the rigidity and visual identity of the original clipart during deformation.
Use a differentiable As-Rigid-As-Possible (ARAP) shape deformation algorithm to warp the clipart according to the updated keypoint positions.
Layered Animation:
Extend the system to handle layered clipart, allowing for animations with topological changes and self-occlusion.
Extensive experiments and ablation studies demonstrate that AniClipart outperforms existing image-to-video generation models in terms of text-video alignment, visual identity preservation, and motion consistency. The system also showcases versatility by adapting to generate a broader array of animation formats.
Stats
"A galloping dog."
"A dolphin bends its body flexibly."
"A young girl is jumping."
"A man is scuba diving and swaying fins."
"A woman is dancing."
"A woman bends arms."
"A woman is stomping."
Quotes
"A galloping dog."
"A dolphin bends its body flexibly."
"A young girl is jumping."
"A man is scuba diving and swaying fins."
"A woman is dancing."
"A woman bends arms."
"A woman is stomping."