Robust PointCloud-Text Matching: Benchmark Datasets and a Baseline Method
A novel instance-level cross-modal retrieval task, PointCloud-Text Matching (PTM), is introduced to find the exact matching instance between point clouds and detailed textual descriptions. Three new benchmark datasets, 3D2T-SR, 3D2T-NR, and 3D2T-QA, are constructed to evaluate PTM, and a robust baseline method, Robust PointCloud-Text Matching (RoMa), is proposed to tackle the key challenges of perceiving local and global features and handling noisy correspondence.