A lightweight and scalable regional point-language contrastive learning framework, RegionPLC, is proposed to enable robust and effective 3D learning from dense regional language supervision for open-world 3D scene understanding.
A scalable and incremental approach for constructing open-set 3D scene representations by leveraging 2D foundation models, enabling open vocabulary instance retrieval, segmentation, annotation, and spatial reasoning.