The paper introduces CompoNeRF, a framework for generating coherent multi-object 3D scenes from textual descriptions. Key highlights:
Interpretation of multi-object text prompts as editable 3D scene layouts, with each object represented by a distinct NeRF paired with a corresponding subtext prompt.
A composition module that seamlessly blends these NeRFs, promoting consistency, while the dual-level text guidance (global and local) reduces ambiguity and boosts accuracy.
The composition design permits decomposition, enabling flexible scene editing and recomposition into new scenes based on the edited layout or text prompts.
Quantitative and qualitative evaluations demonstrate that CompoNeRF outperforms existing text-to-3D methods in generating multi-object scenes that closely align with textual prompts.
User studies confirm the improved semantic accuracy, multi-view consistency, and individual object recognizability of the generated 3D scenes.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Haotian Bai,... a las arxiv.org 09-24-2024
https://arxiv.org/pdf/2303.13843.pdfConsultas más profundas