The paper proposes StructLDM, a two-stage framework for 3D human generation. In the first stage, an auto-decoder is learned to optimize a structured 2D latent representation for each training subject, which preserves the articulated structure and semantics of the human body. The auto-decoder consists of a set of structured NeRFs that are locally conditioned on the 2D latent to render pose- and view-dependent human images.
In the second stage, the learned structured latents are used to train a latent diffusion model, which enables diverse and realistic human generation. The structured latent representation allows StructLDM to achieve state-of-the-art performance on 3D human generation compared to existing 3D-aware GAN methods.
StructLDM also enables various controllable generation and editing tasks, such as pose/view/shape control, compositional generations, part-aware clothing editing, and 3D virtual try-on, by leveraging the structured latent space and the diffusion model.
Para Outro Idioma
do conteúdo original
arxiv.org
Principais Insights Extraídos De
by Tao Hu,Fangz... às arxiv.org 04-02-2024
https://arxiv.org/pdf/2404.01241.pdfPerguntas Mais Profundas