核心概念
Proposing a self-supervised method to jointly learn 3D motion and depth from monocular videos, benefiting both depth and 3D motion estimation.
摘要
The content discusses the limitations of existing methods in self-supervised depth estimation and proposes a new framework, DO3D, to address the challenges. It introduces a hybrid Transformer and CNN model for depth estimation and a motion estimation module with object-wise rigid and non-rigid motion prediction. The system aims to model 3D motion and geometry for accurate depth and motion estimation.
统计
우리 모델은 KITTI 벤치마크에서 절대 상대 깊이 오차 (abs rel)가 0.099로 모든 비교 연구 작업을 능가함.
깊이 추정 작업에서 우리 모델은 모든 평가된 설정에서 우수한 성능을 제공함.
깊이 추정 모델은 높은 해상도 설정에서 모든 비교 연구 작업을 능가함.
引用
"Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion."
"Our model delivers superior performance in all evaluated settings, outperforming all compared research works in the high-resolution setting."