Core Concepts
The authors propose a self-supervised depth and pose estimation model, DualRefine, that tightly couples depth and pose estimation through a feedback loop. The model iteratively refines depth estimates and a hidden state of feature maps by computing local matching costs based on epipolar geometry, and uses the refined depth estimates and feature maps to compute pose updates at each step.
Abstract
The authors propose the DualRefine model, which tightly couples depth and pose estimation through a feedback loop. The key components are:
-
Iterative update module:
- Samples candidate matches along the epipolar line that evolves based on the current pose estimates
- Uses the sampled matching costs to infer per-pixel confidences that are used to compute depth refinements
- Updates the depth estimates are then used in direct feature-metric alignments to refine the pose updates towards convergence
-
Deep equilibrium (DEQ) framework:
- Allows the depth and pose updates to reach a fixed point through iterative refinement
- Enables memory-efficient training by not requiring saving gradients for operations prior to the fixed point
-
Experiments:
- Achieves competitive depth prediction and odometry prediction performance on the KITTI dataset, surpassing published self-supervised baselines
- Demonstrates improved global consistency of visual odometry results compared to other learning-based models
The authors show that their approach of tightly coupling depth and pose estimation, and iteratively refining them using epipolar geometry and direct alignments, leads to improved performance in both tasks compared to prior self-supervised methods.
Stats
The authors report the following key metrics on the KITTI dataset:
Depth estimation:
Absolute Relative Error (Abs Rel): 0.087
Squared Relative Error (Sq Rel): 0.698
Root Mean Squared Error (RMSE): 4.234
Accuracy under threshold δ1: 0.914
Visual odometry:
Translation error (terr) on Seq 09: 3.43%
Rotation error (rerr) on Seq 09: 1.04°/100m
Absolute Trajectory Error (ATE) on Seq 09: 5.18m
Translation error (terr) on Seq 10: 6.80%
Rotation error (rerr) on Seq 10: 1.13°/100m
Absolute Trajectory Error (ATE) on Seq 10: 10.85m