StyleTalker is a novel audio-driven talking head generation model that can synthesize realistic videos of a talking person from a single reference image with accurate lip-sync, head poses, and eye blinks.
MoDiTalker, a novel motion-disentangled diffusion model, generates high-fidelity talking head videos by explicitly separating the generation process into audio-to-motion and motion-to-video stages.