DAWN is a novel non-autoregressive diffusion-based framework for generating high-quality, dynamic-length talking head videos from a single portrait and audio, addressing limitations of previous autoregressive methods in terms of speed, error accumulation, and context utilization.


coremsg

dawn-a-non-autoregressive-diffusion-framework-for-generating-talking-head-videos-of-dynamic-length


DAWN: A Non-Autoregressive Diffusion Framework for Generating Talking Head Videos of Dynamic Length


title_rewrite


MoDiTalker, a novel motion-disentangled diffusion model, generates high-fidelity talking head videos by explicitly separating the generation process into audio-to-motion and motion-to-video stages.


moditalker-a-motion-disentangled-diffusion-model-for-high-fidelity-talking-head-generation


MoDiTalker: A Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation



StyleTalker is a novel audio-driven talking head generation model that can synthesize realistic videos of a talking person from a single reference image with accurate lip-sync, head poses, and eye blinks.


styletalker-one-shot-style-based-audio-driven-talking-head-video-generation


StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation