SLCF-Net: Semantic Scene Completion with LiDAR-Camera Fusion
แนวคิดหลัก
SLCF-Net introduces a novel approach for Semantic Scene Completion by fusing LiDAR and camera data, achieving superior performance in SSC metrics.
บทคัดย่อ
SLCF-Net is a novel method that fuses RGB images and sparse LiDAR scans to infer a 3D voxelized semantic scene. The model leverages Gaussian-decay Depth-prior Projection (GDP) for feature projection and inter-frame consistency to ensure temporal coherence. Extensive experiments on the SemanticKITTI dataset demonstrate SLCF-Net's superior performance compared to existing SSC methods.
I. Introduction
SSC aims to estimate geometry and semantics simultaneously.
RGB images provide semantic content, while depth data offers scene geometry.
SLCF-Net fuses RGB images and LiDAR scans for urban driving scenarios.
II. Related Work
Traditional methods vs. deep neural networks in SSC.
Sensor fusion techniques combining camera and LiDAR data.
Sequence learning for video understanding in SSC tasks.
III. Method
SLCF-Net processes RGB images and sparse LiDAR depth maps.
Feature projection using GDP module and inter-frame feature propagation.
IV. Evaluation
Performance comparison with other SSC baselines on the SemanticKITTI dataset.
V. Conclusions
SLCF-Net demonstrates advantages in SSC but faces a trade-off between accuracy and consistency.
SLCF-Net
สถิติ
Depth Anything Model densely estimates relative distance from an RGB image.
SLCF-Net achieves the highest accuracy across all individual classes on the SemanticKITTI dataset.
คำพูด
"SLCF-Net excels in all SSC metrics."
"Our method outperforms all baselines in both SC and SSC metrics."