Główne pojęcia
The core message of this work is that self-supervised object motion information from unlabeled videos can be leveraged as complementary guidance to facilitate cross-domain alignment for semantic segmentation tasks, without requiring any target domain annotations.
Streszczenie
This paper proposes a novel motion-guided unsupervised domain adaptation (MoDA) method for semantic segmentation. The key contributions are:
-
MoDA utilizes self-supervised object motion information learned from unlabeled video frames as cues to guide cross-domain alignment, without requiring any target domain annotations. This is in contrast to existing domain adaptation methods that rely on adversarial learning or self-training using noisy target pseudo-labels.
-
MoDA consists of two key modules:
- The object discovery module takes the instance-level motion masks and extracts accurate moving object masks.
- The semantic mining module then uses these moving object masks to refine the target pseudo-labels, which are subsequently used to update the segmentation network.
-
Experiments on domain adaptive video and image segmentation benchmarks show that MoDA outperforms existing methods that use optical flow information for temporal consistency. MoDA also complements existing state-of-the-art unsupervised domain adaptation approaches.
The key insight is that self-supervised object motion provides stronger guidance for domain alignment compared to optical flow, as it can capture 3D motion patterns that are crucial for real-world dynamic scenes with multiple moving objects.
Statystyki
The paper reports the following key metrics:
On the VIPER→Cityscapes-Seq benchmark, MoDA achieves 49.1% mIoU, outperforming the DACS+OFR baseline at 46.1% mIoU.
On the GTA5→Cityscapes-Seq benchmark, MoDA achieves 54.9% mIoU, outperforming the DACS+OFR baseline at 52.5% mIoU.
MoDA also complements existing state-of-the-art approaches, e.g. improving HRDA from 73.9% to 75.2% mIoU on GTA5→Cityscapes-Seq.
Cytaty
"MoDA harnesses the self-supervised object motion cues to facilitate cross-domain alignment for segmentation task."
"MoDA shows the effectiveness utilizing object motion as guidance for domain alignment compared with optical flow information."
"MoDA is versatile as it complements existing state-of-the-art UDA approaches."