Conceptos Básicos
SLCF-Net introduces a novel approach for Semantic Scene Completion by fusing LiDAR and camera data, achieving superior performance in SSC metrics.
Resumen
SLCF-Net is a novel method that fuses RGB images and sparse LiDAR scans to infer a 3D voxelized semantic scene. The model leverages Gaussian-decay Depth-prior Projection (GDP) for feature projection and inter-frame consistency to ensure temporal coherence. Extensive experiments on the SemanticKITTI dataset demonstrate SLCF-Net's superior performance compared to existing SSC methods.
I. Introduction
- SSC aims to estimate geometry and semantics simultaneously.
- RGB images provide semantic content, while depth data offers scene geometry.
- SLCF-Net fuses RGB images and LiDAR scans for urban driving scenarios.
II. Related Work
- Traditional methods vs. deep neural networks in SSC.
- Sensor fusion techniques combining camera and LiDAR data.
- Sequence learning for video understanding in SSC tasks.
III. Method
- SLCF-Net processes RGB images and sparse LiDAR depth maps.
- Feature projection using GDP module and inter-frame feature propagation.
IV. Evaluation
- Performance comparison with other SSC baselines on the SemanticKITTI dataset.
V. Conclusions
- SLCF-Net demonstrates advantages in SSC but faces a trade-off between accuracy and consistency.
Estadísticas
Depth Anything Model densely estimates relative distance from an RGB image.
SLCF-Net achieves the highest accuracy across all individual classes on the SemanticKITTI dataset.
Citas
"SLCF-Net excels in all SSC metrics."
"Our method outperforms all baselines in both SC and SSC metrics."