The OpenSUN3D workshop challenge focuses on advancing open-vocabulary 3D scene understanding by exploring methods that go beyond traditional object recognition. The challenge aims to enable intelligent agents to understand complex tasks in novel environments without the need for costly 3D labeled data. Participants are tasked with localizing and segmenting object instances based on open-vocabulary text queries, allowing for a broader range of descriptions including semantics, materials, affordances, and situational context. The challenge dataset is based on the ARKitScenes dataset, providing RGB-D image sequences and 3D reconstructions for experimentation. The competition consists of two phases - development and test - to ensure robust evaluation of the methods. Winning teams employed innovative approaches such as Grounding SAM, CLIP encoders, and SAM3D for accurate instance segmentation in 3D scenes.
翻译成其他语言
从原文生成
arxiv.org
更深入的查询