Khái niệm cốt lõi
The proposed BRGScene framework effectively bridges stereo geometry and bird's-eye-view (BEV) representation to achieve reliable and accurate semantic scene completion solely from stereo RGB images.
Tóm tắt
The paper proposes the BRGScene framework that leverages both stereo geometry and bird's-eye-view (BEV) representations to achieve reliable and accurate semantic scene completion from stereo RGB images.
Key highlights:
- Stereo matching provides explicit geometric cues to mitigate ambiguity, while BEV representation enhances hallucination ability with global semantic context.
- A Mutual Interactive Ensemble (MIE) block is designed to bridge the representation gap between stereo volume and BEV volume for fine-grained reliable perception.
- The MIE block includes a Bi-directional Reliable Interaction (BRI) module that encourages pixel-level reliable information exchange, and a Dual Volume Ensemble (DVE) module that facilitates complementary aggregation.
- BRGScene outperforms state-of-the-art camera-based methods on the SemanticKITTI benchmark, demonstrating significant improvements in both geometric completion and semantic segmentation.
- Preliminary experiments also show the effectiveness of BRGScene in the 3D object detection task on the nuScenes dataset.
Thống kê
"Compared to VoxFormer-T, our method has a 14.5% relative improvement in terms of mIoU on the SemanticKITTI test set."
"Our proposed BRGScene efficiently outperforms all the other stereo-based methods on the SemanticKITTI validation set."
Trích dẫn
"To bridge the representation gap between stereo geometry and BEV features within a unified framework for pixel-level reliable prediction, we propose BRGScene, a framework that bridges stereo matching technique and BEV representation for fine-grained reliable SSC."
"Our framework aims to fully exploit the potential of vision inputs with explicit geometric constraint of stereo matching and global semantic context of BEV representation."