提供オープンボキャブラリー3Dシーン理解の挑戦に関する包括的な概要。
UniM-OV3D, a unified multimodal network, aligns point clouds with images, language, and depth to enable robust open-vocabulary 3D scene understanding by leveraging fine-grained feature representations.
FMGS efficiently integrates 3D Gaussian Splatting and multi-resolution hash encoding to represent 3D scenes, enabling high-quality rendering and fast open-vocabulary semantic queries.