The paper introduces CompoNeRF, a framework for generating coherent multi-object 3D scenes from textual descriptions. Key highlights:
Interpretation of multi-object text prompts as editable 3D scene layouts, with each object represented by a distinct NeRF paired with a corresponding subtext prompt.
A composition module that seamlessly blends these NeRFs, promoting consistency, while the dual-level text guidance (global and local) reduces ambiguity and boosts accuracy.
The composition design permits decomposition, enabling flexible scene editing and recomposition into new scenes based on the edited layout or text prompts.
Quantitative and qualitative evaluations demonstrate that CompoNeRF outperforms existing text-to-3D methods in generating multi-object scenes that closely align with textual prompts.
User studies confirm the improved semantic accuracy, multi-view consistency, and individual object recognizability of the generated 3D scenes.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Haotian Bai,... at arxiv.org 09-24-2024
https://arxiv.org/pdf/2303.13843.pdfDeeper Inquiries