A comprehensive task, Modality-Incomplete Scene Segmentation (MISS), is studied to cover both system-level modality missing and sensor-level modality errors in multi-modal semantic segmentation. A Missing-aware Modal Switch (MMS) training strategy and a Fourier Prompt Tuning (FPT) method are proposed to address these challenges, enabling efficient and robust multi-modal perception.
Sigma, a Siamese Mamba network, effectively fuses information from multiple modalities like RGB, thermal, and depth to achieve superior performance in semantic segmentation tasks, while maintaining high computational efficiency.
UniBEV, a multi-modal 3D object detection framework, is designed to be robust against missing sensor modalities by using uniform BEV encoders and a fusion module that can handle varying input combinations.