The paper introduces two key contributions to the field of 3D scene generation:
A novel depth completion model that learns to predict depth maps conditioned on the existing scene geometry, resulting in improved geometric coherence of the generated scenes. This model is trained in a self-supervised manner using teacher distillation and self-training.
A new benchmarking scheme for evaluating the geometric quality of scene generation methods, based on ground truth depth data. This allows assessing the consistency and accuracy of the generated 3D structure, going beyond visual quality metrics.
The authors show that existing scene generation methods suffer from geometric inconsistencies, which are uncovered by the proposed benchmark. Their depth inpainting model significantly outperforms prior approaches in terms of geometric fidelity, while also maintaining high visual quality.
The pipeline first uses a generative model like Stable Diffusion to hallucinate new scene content beyond the initial input. It then leverages the depth inpainting model to predict depth maps that are consistent with the existing scene geometry, seamlessly integrating the new content. Additional support views are generated to further constrain the scene and fill in occluded regions. Finally, the point cloud representation is converted to a smooth Gaussian splat optimization to produce the final 360-degree scene.
Egy másik nyelvre
a forrásanyagból
arxiv.org
Mélyebb kérdések