Основні поняття
3次元構造と動的シーンを捉える新しいモデル、DySTの提案
Анотація
Introduction:
Visual understanding beyond individual images.
DyST captures 3D structure and dynamics from real-world videos.
Related Work:
Advances in generative modeling of 3D visual scenes.
Learning global latent neural scene representations.
Method:
Dynamic Scenes consist of images with camera pose and scene dynamics.
Neural Scene Representations encode input views into a set-based representation Z.
Sim-to-real Transfer:
Synthetic dataset DySO used for training and evaluation.
Co-training on synthetic and real-world videos for dynamic scene representations.
Experiments:
Novel view synthesis capabilities tested on DySO and SSv2 datasets.
Learned camera and scene dynamics control latents analyzed.
Ablation Study:
Importance of latent control swap for separation of camera pose and scene dynamics demonstrated.
Статистика
Published as a conference paper at ICLR 2024
arXiv:2310.06020v2 [cs.CV] 15 Mar 2024