Core Concepts
The author proposes a hierarchical visual-motion fusion framework that leverages event as a bridge between RGB and LiDAR to improve scene flow by fusing complementary knowledge in homogeneous spaces.
Abstract
The content introduces a novel approach to scene flow estimation by utilizing event as a bridge between RGB and LiDAR. The proposed hierarchical fusion framework explores visual luminance, structure, and motion correlation spaces to enhance scene flow progressively. Extensive experiments on synthetic and real datasets validate the effectiveness of the method in improving all-day scene flow performance.
Single RGB or LiDAR sensors are commonly used for challenging scene flow tasks, relying heavily on visual features. Existing methods adopt fusion strategies to combine cross-modal knowledge but may suffer from modality gaps. The proposed method introduces event as a bridge between RGB and LiDAR, leveraging its homogeneous nature in both visual and motion spaces.
In the visual space, event complements RGB for high dynamic imaging and complements LiDAR for structure integrity. In the motion space, RGB, event, and LiDAR exhibit spatial-dense, temporal-dense, and spatiotemporal-sparse correlations respectively. The hierarchical fusion framework progressively improves scene flow by fusing multimodal knowledge from visual to motion spaces.
The study compares different methods on synthetic and real datasets, demonstrating the superiority of the proposed hierarchical fusion approach. By leveraging complementary knowledge across modalities in homogeneous spaces, the method achieves state-of-the-art performance in all-day scene flow estimation.
Stats
EPE: 0.084 (VisMoFlow)
ACC: 70.34% (VisMoFlow)
Quotes
"The proposed hierarchical fusion can explicitly fuse the multimodal knowledge to progressively improve scene flow from visual space to motion space."
"We bring the auxiliary event as a bridge between RGB and LiDAR modalities."