Transcrib3D: Resolving 3D Referring Expressions using Large Language Models
Transcrib3D uses text as a unifying medium to bridge 3D scene parsing and referential reasoning, achieving state-of-the-art performance on 3D referring expression resolution benchmarks.