This paper surveys the rapidly developing field of text-to-3D generation, exploring core technologies, seminal methods, enhancement directions, and applications, ultimately highlighting its potential to revolutionize 3D content creation.
Layout-Your-3D enables efficient and controllable generation of complex 3D scenes from text prompts by leveraging 2D layouts as blueprints, outperforming existing methods in speed and accuracy.
JointDreamer introduces Joint Score Distillation (JSD), a novel method that enhances 3D consistency in text-to-3D generation by modeling inter-view coherence, effectively addressing the limitations of traditional Score Distillation Sampling (SDS).
Semantic Score Distillation Sampling (SemanticSDS) improves compositional text-to-3D generation by incorporating semantic embeddings and region-specific denoising, enabling the creation of complex scenes with multiple, detailed objects and interactions.
SeMv-3D is a novel framework that leverages triplane priors and a two-step learning process to generate semantically consistent and multi-view coherent 3D objects from text descriptions.
BoostDream, a novel method that seamlessly combines differentiable rendering and text-to-image advancements, can efficiently refine coarse 3D assets into high-quality 3D content guided by text prompts.
DreamMesh is a novel text-to-3D generation framework that leverages explicit 3D representation of triangle meshes to produce high-quality 3D models with clean geometry and rich texture details.
TPA3D, a GAN-based deep learning framework, can efficiently generate high-quality 3D textured meshes that closely align with detailed text descriptions, without relying on human-annotated text-3D pairs for training.
DreamMapping introduces a novel Variational Distribution Mapping (VDM) approach that efficiently models the distribution of rendered images by treating them as degraded versions of diffusion model outputs. This, combined with a Distribution Coefficient Annealing (DCA) strategy, enables the generation of high-fidelity and realistic 3D assets from text prompts.
The proposed DreamTime method significantly improves the quality and diversity of text-to-3D content generation by aligning the 3D optimization process with the sampling process of pre-trained diffusion models.