Gaga reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models, eliminating label inconsistency across views through a 3D-aware memory bank.
The core message of this paper is to introduce a novel pipeline that generates a natural language-queryable topological and semantic representation of a 3D indoor scene. The method extracts both the topology (rooms and transition regions) and semantic room labels from 3D indoor scenes, enabling efficient scene understanding and downstream applications like robot navigation.
GP-NeRF, a novel unified learning framework that embeds NeRF and powerful 2D segmentation modules to perform context-aware 3D scene perception, achieving significant performance improvements over existing SOTA methods.
A weakly-supervised 3D scene graph generation method, 3D-VLAP, that utilizes visual-linguistic interactions to generate pseudo-labels for objects and relations, thereby alleviating the need for extensive human annotation.
This paper introduces a novel method, SceneGraphLoc, for localizing a query image within a database of 3D scene graphs that integrate multiple modalities including object-level point clouds, images, attributes, and relationships. SceneGraphLoc learns a fixed-sized embedding for each node in the scene graph, enabling effective matching with the objects visible in the input query image.
提案されたGroupContrastフレームワークは、セグメントグループ化と意識的なコントラスト学習を組み合わせた自己教師付き表現学習の効果的な手法を提供する。