核心概念
PI3D, a framework that fully leverages pre-trained text-to-image diffusion models to generate high-quality 3D shapes from text prompts in minutes.
摘要
The paper presents PI3D, a framework that efficiently generates high-quality 3D shapes from text prompts by leveraging pre-trained text-to-image diffusion models.
The key ideas are:
- Representing a 3D shape as a set of "pseudo-images" - a triplane representation that shares semantic congruence with orthogonal rendered images.
- Fine-tuning a pre-trained text-to-image diffusion model to generate these pseudo-images, enabling fast sampling of 3D objects from text prompts.
- Using a lightweight refinement process based on Score Distillation Sampling (SDS) to further improve the quality of the sampled 3D objects.
The authors show that PI3D significantly outperforms existing text-to-3D generation methods in terms of visual quality, 3D consistency, and generation speed. It can generate a single 3D shape from text in only 3 minutes, bringing new possibilities for efficient 3D content creation.
The paper also includes an ablation study that examines the impact of depth loss in triplane fitting, the probability of training with real images, and the classifier-free guidance scale.
統計資料
PI3D can generate a single 3D shape from text in only 3 minutes.
PI3D significantly outperforms existing text-to-3D generation methods in terms of CLIP Score and CLIP R-Precision metrics.
引述
"PI3D, a framework that fully leverages pre-trained text-to-image diffusion models to generate high-quality 3D shapes from text prompts in minutes."
"The core idea is to connect the 2D and 3D domains by representing a 3D shape as a set of Pseudo RGB Images."
"PI3D generates a single 3D shape from text in only 3 minutes and the quality is validated to outperform existing 3D generative models by a large margin."