The content introduces a collaborative semantic occupancy prediction framework called CoHFF for connected automated vehicles (CAVs). The key highlights are:
CoHFF is the first camera-based method for collaborative 3D semantic occupancy prediction, going beyond previous approaches that focused on 3D bounding boxes or bird's eye view segmentation.
The framework consists of two specialized pre-trained networks - one for occupancy prediction and one for semantic segmentation. It then employs a hybrid feature fusion approach to combine the task-specific features, enabling more comprehensive 3D scene understanding.
The hybrid fusion includes V2X feature fusion, which shares compressed orthogonal attention features between CAVs, and task feature fusion, which integrates semantic and occupancy information within each vehicle.
To support the evaluation of collaborative semantic occupancy prediction, the authors extend the existing OPV2V dataset with 3D semantic occupancy ground truth labels across 12 categories.
Experiments show that the collaborative approach significantly outperforms single-vehicle performance, improving semantic occupancy prediction by over 30%. The framework also achieves comparable or better results than state-of-the-art methods in subsequent 3D perception applications like object detection and BEV segmentation.
The authors demonstrate the robustness of CoHFF under low communication budgets, where the performance only decreases moderately even when the communication volume is reduced by 97%.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問