The authors present a novel 3D Gaussian blendshape representation for modeling and animating photorealistic head avatars. The representation consists of a neutral base model and a group of expression blendshapes, all represented as 3D Gaussians.
The neutral model and expression blendshapes are learned from a monocular video input. The authors introduce an effective optimization strategy to ensure the semantic consistency between the Gaussian blendshapes and the corresponding mesh blendshapes from classical parametric face models. This allows the Gaussian blendshapes to be linearly blended with expression coefficients to synthesize high-fidelity head avatar animations in real-time (370fps) using Gaussian splatting.
Compared to state-of-the-art NeRF-based and point-based methods, the authors' Gaussian blendshape representation better captures high-frequency details observed in the input video and achieves significantly faster animation and rendering performance. The method also supports head motion control through joint and pose parameters.
The authors conduct extensive experiments and comparisons, demonstrating the superiority of their approach in terms of image quality metrics and runtime efficiency. They also provide ablation studies to validate the importance of the blendshape consistency and the mouth interior Gaussians.
Na inny język
z treści źródłowej
arxiv.org
Głębsze pytania