The paper proposes a novel stereo matching method called Diffusion Models for Iterative Optimization (DMIO) that incorporates diffusion models into the iterative optimization process. The key contributions are:
DMIO reformulates the iterative optimization process of stereo matching as an image-to-image translation diffusion model, providing a new direction for the application of diffusion models.
A novel Time-based Gated Recurrent Unit (T-GRU) is designed as the iterative update operator, which includes a time encoder and an optional agent attention mechanism.
An attention-based context network is introduced to capture a large amount of contextual information, utilizing channel self-attention and a feed-forward network.
Experiments on several public benchmarks show that DMIO achieves competitive stereo matching performance, ranking first on the Scene Flow dataset and requiring only 8 iterations to achieve state-of-the-art results.
The paper first discusses the limitations of existing iterative optimization-based stereo matching methods, which rely on RNNs and face challenges of information loss and discrete optimization. It then proposes DMIO as a solution, drawing inspiration from recent diffusion model works.
The DMIO architecture consists of a weight-sharing feature network, a cost volume, the attention-based context network, the bridge diffusion disparity refinement, and the T-GRU-based update operator. The forward diffusion process maps the initial disparity to the ground truth, while the reverse process progressively refines the disparity.
Extensive experiments on synthetic and real-world datasets demonstrate the effectiveness of DMIO, outperforming state-of-the-art methods in both accuracy and efficiency.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Yuguang Shi at arxiv.org 04-16-2024
https://arxiv.org/pdf/2404.09051.pdfDeeper Inquiries