Conceitos essenciais
Frankenstein introduces a tri-plane diffusion-based framework for generating semantic-compositional 3D scenes in a single pass.
Resumo
The content introduces Frankenstein, a novel approach for generating semantic-compositional 3D scenes using a tri-plane diffusion-based framework. The method allows for the simultaneous generation of multiple separated shapes, each representing a semantically meaningful part. The training process involves compressing tri-planes into a latent space and employing denoising diffusion to approximate the distribution of compositional scenes. Frankenstein demonstrates promising results in generating room interiors and human avatars with automatically separated parts, enabling various downstream applications such as part-wise re-texturing and object rearrangement.
Directory:
- Introduction
- Importance of high-quality 3D assets in computer vision and graphics applications.
- Progress in denoising diffusion models and Transformers accelerating 3D generative models.
- Related Work
- Overview of studies on 3D generation models using different technical solutions.
- Method
- Details of the proposed framework for room generation task.
- Experiments
- Dataset details and implementation specifics.
- Conclusion
- Summary of Frankenstein's capabilities in generating semantic-compositional 3D scenes.
Estatísticas
Yan et al. demonstrates promising results in generating room interiors as well as human avatars with automatically separated parts.
The final dataset contains 2558 bedrooms with 3 classes {wall, bed, cabinet}.
Hyperparameters empirically set to L = 3, C = 32, Rh = 160, Rl = 5, M = 300000, c = 4, r = 40.
Citações
"We propose the first 3D diffusion model that can generate semantic compositional scenes in one tri-plane with a single forward pass."
"We develop a robust coarse-to-fine optimization approach to produce high-fidelity semantic-compositional tri-planes."