Learning Temporal Contexts for Efficient Video Semantic Segmentation
This paper proposes an efficient technique called Coarse-to-Fine Feature Mining (CFFM) to jointly learn local temporal contexts, including static and motional contexts, for video semantic segmentation. It also introduces an extension CFFM++ that further exploits global temporal contexts from the whole video to enhance the segmentation performance.