FashionEngine is an interactive system that enables the generation and editing of high-quality 3D clothed humans. It consists of three key components:
A pre-trained 3D human diffusion model that learns to model 3D humans in a semantic UV latent space from 2D image training data, providing strong priors for diverse generation and editing tasks.
A multimodality-UV space that encodes the texture appearance, shape topology, and textual semantics of human clothing in a canonical UV-aligned space. This aligns user inputs like texts, images, and sketches with the implicit UV latent space for controllable 3D human editing.
Multimodality-UV aligned samplers that learn to sample high-quality and diverse 3D humans from the diffusion prior for multimodal user inputs, enabling text-, sketch-, and image-driven generation and editing.
The system allows users to generate 3D clothed humans either randomly or conditionally from text descriptions or hand-drawing sketches. Users can then edit the generated humans interactively using text, reference images, or sketches. The final 3D humans can be adjusted in pose and shape before rendering into images or videos.
Extensive experiments validate FashionEngine's state-of-the-art performance for conditional generation and editing tasks. The interactive user interface enables both conditional and unconditional generation, as well as various editing tasks in a unified framework.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Tao Hu,Fangz... lúc arxiv.org 04-03-2024
https://arxiv.org/pdf/2404.01655.pdfYêu cầu sâu hơn