Video Editing via Interpolative Non-autoregressive Masked Transformers
MaskINT, an efficient prompt-based video editing framework, disentangles the task into keyframes joint editing and structure-aware frame interpolation, eliminating the need for paired text-video datasets and significantly accelerating the processing time compared to diffusion-based methods.