The paper addresses the challenge of processing point clouds captured by different sensors, such as RGB-D cameras and LiDAR, which possess non-negligible domain gaps. Existing methods typically design different network architectures and train separately on point clouds from various sensors.
The key contributions are:
Voxel-guided dynamic point network: The authors construct a hypernetwork that leverages voxel features and relative positions to guide the extraction of fine-grained local geometric features by a point network.
Hierarchical geometry pools: The authors establish hierarchical geometry pools to store representative point-level geometric features corresponding to different stages of the voxel-based backbone. This allows the voxel representations to access elaborate spatial information efficiently.
Geometry-to-voxel auxiliary learning: The authors introduce a geometry-to-voxel auxiliary mechanism to fuse the point-level geometric features stored in the pools into the voxel representations, enabling better generalization of the voxel-based backbone for multi-sensor point clouds.
The authors conduct experiments on joint multi-sensor datasets, including S3DIS, ScanNet, and SemanticKITTI, to demonstrate the effectiveness and efficiency of GeoAuxNet. The method outperforms other models trained on the joint datasets and achieves competitive performance with experts on single datasets.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Shengjun Zha... at arxiv.org 03-29-2024
https://arxiv.org/pdf/2403.19220.pdfDeeper Inquiries