The paper introduces CompoNeRF, a framework for generating coherent multi-object 3D scenes from textual descriptions. Key highlights:
Interpretation of multi-object text prompts as editable 3D scene layouts, with each object represented by a distinct NeRF paired with a corresponding subtext prompt.
A composition module that seamlessly blends these NeRFs, promoting consistency, while the dual-level text guidance (global and local) reduces ambiguity and boosts accuracy.
The composition design permits decomposition, enabling flexible scene editing and recomposition into new scenes based on the edited layout or text prompts.
Quantitative and qualitative evaluations demonstrate that CompoNeRF outperforms existing text-to-3D methods in generating multi-object scenes that closely align with textual prompts.
User studies confirm the improved semantic accuracy, multi-view consistency, and individual object recognizability of the generated 3D scenes.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Haotian Bai,... lúc arxiv.org 09-24-2024
https://arxiv.org/pdf/2303.13843.pdfYêu cầu sâu hơn