In the realm of autonomous vehicles, understanding the surrounding 3D environment in real-time is crucial. The article introduces a novel approach that leverages front-view 2D camera images and LiDAR scans to predict 3D semantic occupancy efficiently. By utilizing sparse convolution networks, specifically the Minkowski Engine, the model addresses challenges faced by traditional methods in real-time applications due to high computational demands. The focus is on jointly solving the problems of scene completion and semantic segmentation for outdoor driving scenarios characterized by sparsity. The proposed model demonstrates competitive accuracy on benchmark datasets like nuScenes, achieving real-time inference capabilities close to human perception rates of 20-30 frames per second (FPS). The system pipeline involves projecting LiDAR points onto RGB images, extracting features using EfficientNetV2, and performing scene completion and semantic segmentation through a multi-task problem approach. The use of sparse convolution networks allows for selective processing of meaningful 3D points in large and sparse areas typical of LiDAR and camera data capturing outdoor environments.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Samuel Sze,L... lúc arxiv.org 03-14-2024
https://arxiv.org/pdf/2403.08748.pdfYêu cầu sâu hơn