The paper introduces Gaga, a framework that reconstructs and segments open-world 3D scenes by utilizing inconsistent 2D masks predicted by zero-shot segmentation models. To address the challenge of label inconsistency across different views, Gaga employs a 3D-aware memory bank that collects and categorizes 3D Gaussians into groups. This allows Gaga to associate 2D masks across diverse camera poses by finding the group of 3D Gaussians that have the largest overlap with the deprojected mask.
The key steps are:
Extensive experiments on diverse datasets demonstrate that Gaga outperforms previous methods in terms of segmentation accuracy, multi-view consistency, and robustness to variations in camera poses and training data quantity. The high-quality 3D segmentation results also enable various downstream applications such as scene manipulation.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Weijie Lyu,X... alle arxiv.org 04-12-2024
https://arxiv.org/pdf/2404.07977.pdfDomande più approfondite