The paper introduces CompoNeRF, a framework for generating coherent multi-object 3D scenes from textual descriptions. Key highlights:
Interpretation of multi-object text prompts as editable 3D scene layouts, with each object represented by a distinct NeRF paired with a corresponding subtext prompt.
A composition module that seamlessly blends these NeRFs, promoting consistency, while the dual-level text guidance (global and local) reduces ambiguity and boosts accuracy.
The composition design permits decomposition, enabling flexible scene editing and recomposition into new scenes based on the edited layout or text prompts.
Quantitative and qualitative evaluations demonstrate that CompoNeRF outperforms existing text-to-3D methods in generating multi-object scenes that closely align with textual prompts.
User studies confirm the improved semantic accuracy, multi-view consistency, and individual object recognizability of the generated 3D scenes.
toiselle kielelle
lähdeaineistosta
arxiv.org
Tärkeimmät oivallukset
by Haotian Bai,... klo arxiv.org 09-24-2024
https://arxiv.org/pdf/2303.13843.pdfSyvällisempiä Kysymyksiä