Keskeiset käsitteet
We propose a novel 3D neural head avatar representation that achieves real-time rendering while maintaining high-quality and fine-grained controllability. The key idea is to attach multiple small hash tables as "local blendshapes" to each vertex of an underlying 3D face model, which are then linearly merged with expression-dependent weights to produce expression-specific embeddings for efficient neural radiance field decoding.
Tiivistelmä
The paper presents a novel approach for building efficient and high-quality 3D neural head avatars from monocular RGB videos. The core of the method is the introduction of "mesh-anchored hash table blendshapes", where multiple small hash tables are attached to each vertex of a 3D face parametric model (3DMM). These hash tables act as local "blendshapes" that can be linearly combined with expression-dependent weights to capture fine-grained facial details.
The pipeline consists of three main steps:
Mesh-anchored Hash Table Blendshapes (Sec. 3.1): For each 3DMM vertex, multiple small hash tables are attached as local "blendshapes" to encode the local radiance field information.
Merge Mesh-anchored Blendshapes (Sec. 3.2): A U-Net CNN predicts expression-dependent weights to linearly combine the hash table blendshapes on each vertex, producing expression-specific embeddings.
Neural Radiance Field Decoding (Sec. 3.3): For each query point, the method performs a k-nearest-neighbor search on the 3DMM vertices to pull the merged hash table embeddings, which are then fed into a lightweight MLP to predict the final density and color.
The proposed representation enables efficient inference using hash encoding, while maintaining the high-quality and fine-grained controllability of 3DMM-anchored NeRFs. The authors also introduce a hierarchical k-NN search method to further accelerate the rendering speed. Extensive experiments demonstrate that the method achieves real-time rendering (>30 FPS) with comparable quality to state-of-the-art high-quality 3D head avatars, and significantly better results on challenging expressions compared to prior efficient 3D avatar approaches.
Tilastot
The paper does not provide any specific numerical data or statistics in the main text.
Lainaukset
The paper does not contain any striking quotes that support the key logics.