Keskeiset käsitteet
Our proposed Talk3D framework can faithfully reconstruct plausible facial geometries by effectively adopting a pre-trained 3D-aware generative prior, and predicting dynamic face variations in the NeRF space driven by audio.
Tiivistelmä
The paper introduces Talk3D, a novel framework for audio-driven talking head synthesis that leverages a personalized 3D-aware generative prior to faithfully reconstruct facial geometries.
Key highlights:
- Talk3D integrates a pre-trained 3D-aware GAN (EG3D) as the base representation, which allows for rendering realistic talking portraits from unseen viewpoints.
- The model predicts a "deltaplane" that represents the dynamic face variations in the NeRF space, conditioned on the input audio.
- An audio-guided attention U-Net architecture is employed to effectively disentangle local facial variations (e.g., background, torso, eye movements) from the audio-driven lip movements.
- Extensive experiments demonstrate that Talk3D outperforms state-of-the-art NeRF-based talking head synthesis methods in terms of both quantitative and qualitative evaluations.
Tilastot
The paper does not provide any specific numerical data or statistics to support the key claims.
Lainaukset
The paper does not contain any striking quotes supporting the key arguments.