Core Concepts
Semantic Flow learns semantic representations of dynamic scenes from continuous flow features that capture rich 3D motion information, enabling various applications such as instance-level scene editing, semantic completion, dynamic scene tracking, and semantic adaptation on novel scenes.
Abstract
The paper proposes Semantic Flow, a neural semantic representation for dynamic scenes from monocular videos. Unlike previous NeRF-based methods that reconstruct dynamic scenes from the colors and volume densities of individual points, Semantic Flow learns semantics from continuous flows that contain rich 3D motion information.
To address the 2D-to-3D ambiguity problem when extracting 3D flow features from 2D video frames, the model considers the volume densities as opacity priors that describe the contributions of flow features to the semantics on the frames. Specifically:
- A flow network is first used to predict flows in the dynamic scene.
- A flow feature aggregation module extracts flow features from video frames by using the locations of points on the flows as indexes.
- A flow attention module is proposed to extract motion information from the flow features.
- A semantic network outputs semantic logits of the flows, which are then integrated with volume densities in the viewing direction to supervise the flow features with semantic labels on video frames.
The model is evaluated on a new Semantic Dynamic Scene dataset, showing its ability to learn semantics from multiple dynamic scenes and support various applications such as instance-level scene editing, semantic completion, dynamic scene tracking, and semantic adaptation on novel scenes.
Stats
The volume densities σdy provide important priors for integrating the semantic logits of dynamic objects.
The flow attention module successfully extracts motion information from flow features, improving the performance on mIoU matrix.
Using the semantic consistency constraint Lconsist helps render the semantic field with finer details.
Quotes
"In this work, we pioneer Semantic Flow, a neural semantic representation of dynamic scenes from monocular videos."
"To learn semantics in dynamic radiance fields, an intuitive solution is to add a semantic segmentation head to the previous dynamic NeRF methods. In this way, the semantics of each point is estimated from the position, timestamp, and other information related to the point."
"Due to the lack of motion information, predicting semantics from points forces the model to overfit to the training views in the current scene, which limits its generalization performance when training with few annotated labels or transferring to novel scenes."