The paper presents a comprehensive analysis and review of Score Distillation Sampling (SDS)-based text-to-3D generation methods. It identifies the key limitation of these approaches in accurately modeling the variational distribution of rendered images during optimization.
To address this, the authors propose Variational Distribution Mapping (VDM), which treats rendered images as degraded versions of diffusion model outputs. VDM learns a lightweight trainable neural network to model this degradation process, eliminating the need for complex Jacobian matrix calculations in the diffusion model's UNet. This allows for efficient construction of the variational distribution of rendered images.
Additionally, the authors introduce Distribution Coefficient Annealing (DCA), a strategy that applies a time-dependent coefficient to accommodate the dynamic changes in the rendered image distribution, further improving generation quality.
Integrating VDM and DCA, the authors develop a text-to-3D generation framework using 3D Gaussian Splatting as the 3D representation. Extensive experiments and evaluations demonstrate the capability of VDM and DCA to generate high-fidelity and realistic 3D assets with optimization efficiency, outperforming state-of-the-art methods in terms of semantic coherence and visual quality.
The paper also discusses the generalizability of the proposed methods, showing their applicability to other 3D representations, such as NeRF, as well as text-to-2D generation tasks.
Para outro idioma
do conteúdo fonte
arxiv.org
Principais Insights Extraídos De
by Zeyu Cai, Du... às arxiv.org 09-10-2024
https://arxiv.org/pdf/2409.05099.pdfPerguntas Mais Profundas