The paper introduces AWOL, a method for generating novel 3D animals and trees from text or images. The key idea is to learn a mapping between the latent space of a vision-language model (like CLIP) and the parameter space of existing 3D shape models. This allows using language to control the generation of 3D shapes, enabling the creation of novel objects that were not present in the training data.
The authors first introduce a new 3D parametric shape model for animals, called SMAL+, which extends previous models with more species. They then use a Real-NVP network to learn the mapping between CLIP's latent space and the shape parameters of SMAL+ and a procedural tree generation model.
The experiments show that AWOL can generate realistic 3D animals and trees, including novel species and breeds that were not seen during training. The method can handle both text and image inputs, and the generated shapes are rigged and ready for rendering and animation. The authors also perform extensive ablation studies to analyze the impact of different design choices in the Real-NVP network.
Overall, AWOL demonstrates the potential of using language to control and generate novel 3D content, going beyond the limitations of existing 3D shape models.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Silvia Zuffi... lúc arxiv.org 04-05-2024
https://arxiv.org/pdf/2404.03042.pdfYêu cầu sâu hơn