GaussianTalker is a novel framework for real-time generation of pose-controllable talking heads by leveraging the fast rendering capabilities of 3D Gaussian Splatting (3DGS) and addressing the challenges of directly controlling 3DGS with speech audio.
A hierarchical diffusion framework, DreamHead, is proposed to effectively learn the spatial-temporal correspondence between audio input and facial dynamics for high-quality talking head video synthesis.