Core Concepts
Explicitly leveraging edge information is critical for producing high-quality monocular depth maps with clear edges and details.
Abstract
The paper presents a novel monocular depth estimation method, named ECFNet, that aims to estimate high-quality depth maps with accurate edges and details from a single RGB image. The key insights are:
Edge information plays a critical role in producing depth maps with clear edges and details. The authors find that existing MDE networks struggle to capture all the edge information due to the smoothing effect of convolution layers, downsampling, and upsampling.
To address this, ECFNet first extracts an edge map and an edge-highlighted image from the input, and then uses a pre-trained MDE network to infer initial depth maps from these three inputs (original image, edge map, edge-highlighted image).
ECFNet then proposes a Layered Fusion Module (LFM) to fuse the initial depth maps, leveraging the strengths of each input. LFM uses a hybrid edge detection strategy to obtain high-quality edges.
However, the fused depth from LFM may still have issues with inaccurate overall structure and inconsistent depth range/distribution. To fix these problems, ECFNet introduces a Depth Consistency Module (DCM) to learn a depth residual and update the fused depth, maintaining high-frequency details and recovering the fine structure.
Extensive experiments on public datasets show that ECFNet achieves state-of-the-art performance, especially in producing accurate edge depth. ECFNet also demonstrates superior robustness to degraded images compared to other methods.
Stats
The paper reports the following key metrics:
SqRel: 7.236, 6.997, 7.035
ESR: 0.538, 0.494, 0.511
ORD: 0.329, 0.378, 0.375
Quotes
"The edge itself hides the most critical information."
"Estimating high-quality depth maps with accurate edges and details remains a significant obstacle for MDE networks."
"Driven by this analysis, we propose to explicitly employ the image edges as input for ECFNet and fuse the initial depths from different sources to produce the final depth."