In the realm of autonomous vehicles, understanding the surrounding 3D environment in real-time is crucial. The article introduces a novel approach that leverages front-view 2D camera images and LiDAR scans to predict 3D semantic occupancy efficiently. By utilizing sparse convolution networks, specifically the Minkowski Engine, the model addresses challenges faced by traditional methods in real-time applications due to high computational demands. The focus is on jointly solving the problems of scene completion and semantic segmentation for outdoor driving scenarios characterized by sparsity. The proposed model demonstrates competitive accuracy on benchmark datasets like nuScenes, achieving real-time inference capabilities close to human perception rates of 20-30 frames per second (FPS). The system pipeline involves projecting LiDAR points onto RGB images, extracting features using EfficientNetV2, and performing scene completion and semantic segmentation through a multi-task problem approach. The use of sparse convolution networks allows for selective processing of meaningful 3D points in large and sparse areas typical of LiDAR and camera data capturing outdoor environments.
翻譯成其他語言
從原文內容
arxiv.org
深入探究