Core Concepts
Sparse CNNs can outperform transformer networks in 3D semantic segmentation with adaptivity.
Abstract
Introduction
Point cloud transformers revolutionized 3D recognition in the 2020s.
Sparse CNNs remain valuable due to efficiency and ease of application.
Key Components
Adaptive receptive fields and relations bridge performance gap.
Omni-Adaptive 3D CNNs (OA-CNNs) enhance adaptivity at minimal cost.
Comparison
OA-CNNs surpass point transformers in accuracy with less latency and memory cost.
Related Work
Point-based methods advocate direct manipulation of unstructured points.
Spatially Adaptive Receptive Fields
Different parts of a scene require varying receptive field sizes for accurate predictions.
Adaptive Relation Convolution
ARConv dynamically generates kernel weights for non-empty voxels based on correlations with centroid voxel.
Architecture
OA-CNNs consist of sparse and submanifold convolution modules with an adaptive aggregator.
Experiments
OA-CNNs outperform state-of-the-art methods on various benchmarks without pretraining or auxiliary methods.
Stats
"It achieves 76.1%, 78.9%, and 70.6% mIoU on ScanNet v2, nuScenes, and SemanticKITTI validation benchmarks respectively."
"Our method outperforms state-of-the-art methods with promising efficiency on popular benchmarks including ScanNet v2, ScanNet200, nuScenes, and SemanticKITTI semantic segmentation."