แนวคิดหลัก
The proposed BRGScene framework effectively bridges stereo geometry and bird's-eye-view (BEV) representation to achieve reliable and accurate semantic scene completion solely from stereo RGB images.
บทคัดย่อ
The paper proposes the BRGScene framework that leverages both stereo geometry and bird's-eye-view (BEV) representations to achieve reliable and accurate semantic scene completion from stereo RGB images.
Key highlights:
- Stereo matching provides explicit geometric cues to mitigate ambiguity, while BEV representation enhances hallucination ability with global semantic context.
- A Mutual Interactive Ensemble (MIE) block is designed to bridge the representation gap between stereo volume and BEV volume for fine-grained reliable perception.
- The MIE block includes a Bi-directional Reliable Interaction (BRI) module that encourages pixel-level reliable information exchange, and a Dual Volume Ensemble (DVE) module that facilitates complementary aggregation.
- BRGScene outperforms state-of-the-art camera-based methods on the SemanticKITTI benchmark, demonstrating significant improvements in both geometric completion and semantic segmentation.
- Preliminary experiments also show the effectiveness of BRGScene in the 3D object detection task on the nuScenes dataset.
สถิติ
"Compared to VoxFormer-T, our method has a 14.5% relative improvement in terms of mIoU on the SemanticKITTI test set."
"Our proposed BRGScene efficiently outperforms all the other stereo-based methods on the SemanticKITTI validation set."
คำพูด
"To bridge the representation gap between stereo geometry and BEV features within a unified framework for pixel-level reliable prediction, we propose BRGScene, a framework that bridges stereo matching technique and BEV representation for fine-grained reliable SSC."
"Our framework aims to fully exploit the potential of vision inputs with explicit geometric constraint of stereo matching and global semantic context of BEV representation."