Core Concepts
MRC-Net proposes a novel approach to estimate the 6-DoF pose of objects using a sequential pipeline with multi-scale residual correlation, outperforming existing methods on challenging datasets.
Abstract
MRC-Net introduces a two-stage deep learning pipeline for pose estimation from RGB images. The method involves classification and regression stages, connected by a multi-scale residual correlation layer. Soft probabilistic labels are used to define pose classes, reducing ambiguity in classification for symmetric objects. The network is end-to-end trainable and achieves state-of-the-art accuracy on various benchmark datasets. Experiments show that the sequential approach significantly boosts performance compared to parallel methods. The multi-scale residual correlation captures correspondences between input and rendered images at different scales, improving discriminative features for pose estimation. Perspective correction enhances accuracy, while test-time augmentation further improves performance.
Stats
We propose a single-shot approach to determining 6-DoF pose of an object with available 3D CAD model from a single RGB image.
Our method demonstrates state-of-the-art accuracy on four challenging BOP benchmark datasets:
T-LESS, LM-O, YCB-V, and ITODD.
The overall network, dubbed MRC-Net is shown in Figure 1.
We validate our hypothesis in experiments, showing that the simple concept of sequential pose estimation with the MRC layer produces a major boost in performance and outperforms state-of-the-art techniques on the BOP Challenge datasets without the need for pre-initialization, iterative refinement, and post-processing.
To summarize, the main contributions of this work are:
• MRC-Net employs a Siamese network with shared weights between both stages to learn embeddings for input and rendered images.
• A novel MRC layer implicitly captures correspondences between input and rendered images at both global and local scales.
• State-of-the-art accuracy on a variety of BOP benchmark datasets, advancing average recall by 2.4% on average compared to results reported by competing methods.