핵심 개념
StructLDM, a diffusion-based unconditional 3D human generative model, learns a structured and dense latent representation to capture the articulated structure and semantics of the human body, enabling diverse and controllable 3D human generation.
초록
The paper proposes StructLDM, a two-stage framework for 3D human generation. In the first stage, an auto-decoder is learned to optimize a structured 2D latent representation for each training subject, which preserves the articulated structure and semantics of the human body. The auto-decoder consists of a set of structured NeRFs that are locally conditioned on the 2D latent to render pose- and view-dependent human images.
In the second stage, the learned structured latents are used to train a latent diffusion model, which enables diverse and realistic human generation. The structured latent representation allows StructLDM to achieve state-of-the-art performance on 3D human generation compared to existing 3D-aware GAN methods.
StructLDM also enables various controllable generation and editing tasks, such as pose/view/shape control, compositional generations, part-aware clothing editing, and 3D virtual try-on, by leveraging the structured latent space and the diffusion model.
통계
"Recent 3D human generative models have achieved remarkable progress by learning 3D-aware GANs from 2D images."
"Existing works overlook the semantics and structure of the human body and sample humans in a compact 1D space, which severely limits their controlling ability."
"StructLDM achieves diverse and high-quality 3D human generation, outperforming existing 3D-aware GAN methods on FID."
인용구
"Different from the widely adopted 1D latent, we explore the higher-dimensional latent space without latent mapping for 3D human generation and editing."
"We propose StructLDM, a diffusion-based 3D human generative model, which achieves state-of-the-art results in 3D human generation."
"Emerging from our design choices, we show novel controllable generation and editing tasks, e.g., 3D compositional generations, part-aware 3D editing, 3D virtual try-on."