Core Concepts
We propose to learn a 3D semantic template feature field along with the generative model, enabling efficient on-the-fly pose estimation of training images to facilitate 3D-aware GAN training from unposed images.
Abstract
The authors present a novel approach to enable learning 3D-aware generative models from in-the-wild images with unknown camera pose distribution. The key idea is to learn a 3D semantic template feature field along with the generative model, which allows for efficient on-the-fly pose estimation of training images.
Specifically:
- The generator is augmented to jointly produce a radiance field and a semantic feature field, sharing the density.
- The mean of the learned feature field is used as a 3D template, which enables efficient 2D-3D pose estimation for real images.
- The pose estimation is performed by discretizing the camera pose space, rendering the template features, and finding the best match with the real image features.
- The authors propose to combine grid search with phase correlation to efficiently estimate the camera pose, including scale and in-plane rotation.
- Experiments on various challenging datasets, including real-world cars, planes, and elephants, demonstrate the superiority of the proposed method over state-of-the-art alternatives.
Stats
The authors use the following datasets:
ShapeNet Cars: A synthetic dataset with ground truth camera poses.
CompCars: A real-world dataset with 136k unposed car images.
SDIP Elephant: A dataset with 20k unposed elephant images.
LSUN Plane: A dataset with 130k unposed plane images.
Quotes
"Our key idea is to learn a 3D semantic template feature field along with the generative model and define the object pose estimation as an auxiliary task taking the template feature field as the canonical object space."
"We propose to efficiently solve the camera pose estimation by incorporating phase correlation for estimation scale and in-plane rotation."
"Our model learns 3D-aware generative models on multiple challenging datasets, including real-world cars, planes, and elephants."