Centrala begrepp
Magic-Boost, a multi-view conditioned diffusion model, significantly refines coarse 3D generative results through a brief period of SDS optimization by leveraging precise guidance from synthesized multi-view images.
Sammanfattning
The content discusses the development of Magic-Boost, a multi-view conditioned diffusion model, to enhance the quality of coarse 3D generative results.
Key highlights:
Recent progress in 2D diffusion models has enabled efficient 3D content creation by leveraging pre-trained 2D models. However, the generated results still lack intricate textures and complex geometries due to local inconsistencies and limited resolution.
To address this, the authors propose Magic-Boost, a multi-view conditioned diffusion model that takes pseudo-generated multi-view images as input, implicitly encodes 3D information, and provides precise SDS guidance to refine the coarse 3D outputs within a brief interval (∼15 minutes).
The model employs a denoising U-Net to efficiently extract dense local features from multi-view inputs, and a self-attention mechanism to enable interactions and information sharing across different views.
The authors introduce data augmentation strategies, including random drop, random scale, and noise disturbance, to facilitate the training process and improve the model's robustness.
An Anchor Iterative Update loss is proposed to alleviate the over-saturation problem in SDS optimization, leading to high-quality generation results with detailed geometry and realistic textures.
Extensive experiments demonstrate that Magic-Boost significantly enhances the quality of coarse 3D inputs, efficiently generating high-quality 3D assets with rich geometric and textural details.
Statistik
"Benefiting from the rapid development of 2D diffusion models, 3D content creation has made significant progress recently."
"Instant3D firstly finetune the pre-trained 2D diffusion models to unlock the ability of multi-view image generation, and then utilize a robust reconstruction model to derive 3D representations."
"Wonder3D finetunes the 2D diffusion model with cross-domain attention layers to enhance the 3D consistency of generative outputs."
Citat
"Commencing with a coarse 3D model, efforts have been made to refine it through SDS optimization with small noise levels, utilizing text or single-view conditioned diffusion models."
"We argue that both text and single-view image conditions are inadequate in providing explicit control and precise guidance."