핵심 개념
SHAPELLM is a 3D multimodal Large Language Model designed for embodied interaction, achieving state-of-the-art performance in 3D geometry understanding and language-unified 3D interaction tasks.
통계
RECON++ achieves a remarkable accuracy of 95.25% on the ScanObjectNN benchmark.
SHAPELLM-13B surpasses previous best records by +5.1% on the 3D MM-Vet benchmark.
인용구
"SHAPELLM demonstrates superior performance in various tasks, including 3D captioning, 3D VQA, and embodied visual grounding."
"The model shows robust capabilities in knowledge representation, reasoning, and instruction-following dialogue."