Core Concepts
A novel decoder network that converts the output of pre-trained 3D-aware GANs into explicit 3D Gaussian Splatting scenes, enabling high-quality and high-resolution rendering at real-time frame rates.
Abstract
The authors present a novel approach that combines the high rendering quality of NeRF-based 3D-aware Generative Adversarial Networks (GANs) with the flexibility and computational advantages of 3D Gaussian Splatting (3DGS).
Key highlights:
- The authors train a decoder network that maps the implicit NeRF representations from 3D-aware GANs like EG3D and PanoHead to explicit 3D Gaussian Splatting attributes.
- This allows integrating the representational diversity and quality of 3D GANs into the 3DGS ecosystem for the first time.
- The decoder enables high-resolution GAN inversion and real-time GAN editing with 3D Gaussian Splatting scenes.
- The authors propose a sequential decoder architecture, a strategy for sampling Gaussian splat positions, and a generator backbone fine-tuning technique to improve the decoder's capacity.
- Quantitative and qualitative results demonstrate that the decoded 3D Gaussian Splatting scenes achieve high visual similarity to the target 3D-aware GANs, while enabling real-time rendering at up to 5 times higher frame rates.
Stats
Rendering the Gaussian Splatting scene achieves about four times the FPS compared to rendering the 3D-aware GANs.
When increasing resolution four-fold, the Gaussian Splatting renderer still achieves more than three times the framerate of the GAN models at the lower resolution.
Quotes
"Combining NeRFs and GANs is highly advantageous, as rendering from a latent space offers multiple benefits: Firstly, it allows for rendering an unlimited amount of unique appearances. Secondly, a large variety of editing methods can be applied. And thirdly, single 2D images can be inverted, using 3D GAN inversion, allowing for full 3D reconstructions from a single image."
"Sampling visual information from latent spaces with large representational variety poses a challenge for rendering with 3DGS, as the framework requires the information for the appearance of the scene to be encoded as attributes of individual splats, rather than in the latent space itself."