The paper presents a novel method for transferring the depth-scale from source datasets with ground-truth (GT) depth labels to target datasets without any depth measurements. The key insights are:
Self-supervised depth estimators result in up-to-scale depth predictions that are linearly correlated to their absolute GT depth values across the domain. This linear relationship can be modeled using a single scalar factor.
Aligning the field-of-view (FOV) of the source and target datasets prior to training results in a shared linear depth ranking scale between the domains.
The method first trains the depth network using self-supervision on a mix of source and target images (with FOV alignment). It then estimates the depth-scale factor by fitting a linear model between the source up-to-scale predictions and their GT depths. Finally, this factor is used to scale the target up-to-scale predictions, achieving absolute depth estimates on the new domain.
The proposed method was successfully demonstrated on the KITTI, DDAD and nuScenes datasets, using various existing real or synthetic source datasets, achieving comparable or better accuracy than other depth-scale transfer methods that do not use target GT depths.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Alexandra Da... at arxiv.org 04-16-2024
https://arxiv.org/pdf/2303.07662.pdfDeeper Inquiries