Alapfogalmak
Gaga reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models, eliminating label inconsistency across views through a 3D-aware memory bank.
Kivonat
The paper introduces Gaga, a framework that reconstructs and segments open-world 3D scenes by utilizing inconsistent 2D masks predicted by zero-shot segmentation models. To address the challenge of label inconsistency across different views, Gaga employs a 3D-aware memory bank that collects and categorizes 3D Gaussians into groups. This allows Gaga to associate 2D masks across diverse camera poses by finding the group of 3D Gaussians that have the largest overlap with the deprojected mask.
The key steps are:
- Gaussian Splatting is used to reconstruct the 3D scene, and an open-world 2D segmentation model is applied to generate class-agnostic masks for each input image.
- A 3D-aware memory bank is initialized by storing the corresponding Gaussians of each mask in the first image. For subsequent images, masks are assigned to existing groups in the memory bank or a new group is created based on the overlap between the mask's Gaussians and the groups.
- The associated masks with consistent group IDs across views are then used as pseudo labels to train an identity encoding on each 3D Gaussian for segmentation rendering.
Extensive experiments on diverse datasets demonstrate that Gaga outperforms previous methods in terms of segmentation accuracy, multi-view consistency, and robustness to variations in camera poses and training data quantity. The high-quality 3D segmentation results also enable various downstream applications such as scene manipulation.
Statisztikák
The paper does not provide specific numerical data or statistics. The focus is on the technical approach and qualitative/quantitative evaluation of the proposed Gaga framework.
Idézetek
The paper does not contain any striking quotes that support the key logics. The content is presented in a technical, descriptive manner.