The paper introduces Gaga, a framework that reconstructs and segments open-world 3D scenes by utilizing inconsistent 2D masks predicted by zero-shot segmentation models. To address the challenge of label inconsistency across different views, Gaga employs a 3D-aware memory bank that collects and categorizes 3D Gaussians into groups. This allows Gaga to associate 2D masks across diverse camera poses by finding the group of 3D Gaussians that have the largest overlap with the deprojected mask.
The key steps are:
Extensive experiments on diverse datasets demonstrate that Gaga outperforms previous methods in terms of segmentation accuracy, multi-view consistency, and robustness to variations in camera poses and training data quantity. The high-quality 3D segmentation results also enable various downstream applications such as scene manipulation.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by Weijie Lyu,X... klokken arxiv.org 04-12-2024
https://arxiv.org/pdf/2404.07977.pdfDypere Spørsmål