Efficient Cross-Modal Localization of Images in 3D Scene Graphs
This paper introduces a novel method, SceneGraphLoc, for localizing a query image within a database of 3D scene graphs that integrate multiple modalities including object-level point clouds, images, attributes, and relationships. SceneGraphLoc learns a fixed-sized embedding for each node in the scene graph, enabling effective matching with the objects visible in the input query image.