The paper proposes StructLDM, a two-stage framework for 3D human generation. In the first stage, an auto-decoder is learned to optimize a structured 2D latent representation for each training subject, which preserves the articulated structure and semantics of the human body. The auto-decoder consists of a set of structured NeRFs that are locally conditioned on the 2D latent to render pose- and view-dependent human images.
In the second stage, the learned structured latents are used to train a latent diffusion model, which enables diverse and realistic human generation. The structured latent representation allows StructLDM to achieve state-of-the-art performance on 3D human generation compared to existing 3D-aware GAN methods.
StructLDM also enables various controllable generation and editing tasks, such as pose/view/shape control, compositional generations, part-aware clothing editing, and 3D virtual try-on, by leveraging the structured latent space and the diffusion model.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Tao Hu,Fangz... lúc arxiv.org 04-02-2024
https://arxiv.org/pdf/2404.01241.pdfYêu cầu sâu hơn