Core Concepts
This research paper introduces EVSNet, a novel framework that leverages event cameras to improve the accuracy and temporal consistency of video semantic segmentation in low-light conditions.
Stats
EVSNet achieves an mIoU of 23.6, 26.7, 28.2, and 34.1 using the AFFormer-Tiny, AFFormer-Base, MiT-B0, and MiT-B1 backbone on the low-light VSPW dataset.
The mIoU increases by 54% and mV C16 increases by 7% with similar model size on the low-light VSPW dataset.
EVSNet achieves a mIoU of 57.9, 60.9, 59.6, and 63.2 using the AFFormer-Tiny, AFFormer-Base, MiT-B0, and MiT-B1 backbone on the low-light Cityscapes dataset.
The mIoU increases by 26% with similar model size on the low-light Cityscapes dataset.
EVSNet achieves a mIoU of 53.9 & 55.2 using the MiT-B0 and MiT-B1 backbone on the NightCity dataset.
The mIoU increases by 1% with only 1/3 model size on the NightCity dataset.
Quotes
"Event cameras asynchronously measure sparse data streams at high temporal resolution (10µs vs 3ms), higher dynamic range (140dB vs 60dB), and significantly lower energy (10mW vs 3W) compared to conventional cameras."
"The event modality, characterized by its ability to capture dynamic changes (such as motion and sudden illumination alterations) in the scene, offers valuable structural and motional information that is not captured by conventional cameras."