Conceptos Básicos
FaceTalk introduces a novel generative approach for synthesizing high-fidelity 3D motion sequences of talking human heads from input audio signals, utilizing diffusion models in the expression space of neural parametric head models.
Estadísticas
Given an input speech signal, we propose a diffusion-based approach to synthesize high-quality and temporally consistent 3D motion sequences of high-fidelity human heads as neural parametric head models. Our method can generate a diverse set of expressions (including wrinkles and eye blinks) and the generated mouth motion is temporally synchronized with the given audio signal.
To the best of our knowledge, this is the first work to propose a generative approach for realistic and high-quality motion synthesis of volumetric human heads, representing a significant advancement in the field of audio-driven 3D animation.
Our approach stands out in its ability to generate plausible motion sequences that consistently achieve superior and visually natural motion, encompassing diverse facial expressions and styles, outperforming existing methods by 75% in perceptual user study evaluation.
Citas
"Our experimental results substantiate the effectiveness of FaceTalk, consistently achieving superior and visually natural motion." - Authors