Alapfogalmak
The proposed end-to-end framework accurately reconstructs dense 3D meshes of both hands from a single RGB-D input by effectively fusing color and depth information through a novel pyramid deep fusion network.
Kivonat
The paper presents an end-to-end framework for reconstructing dense 3D meshes of both hands from a single RGB-D input. The key highlights are:
Feature Extraction:
RGB features are extracted using ResNet50, while point cloud features are extracted using PointNet++.
The depth map is converted to an unordered point cloud to preserve more geometric details.
Pyramid Deep Fusion Network (PDFNet):
PDFNet fuses the RGB and point cloud features at multiple scales using a pyramid structure.
It employs a feature transformation network to adaptively allocate weights to the two feature modalities, mitigating interference from local unreliable regions.
GCN-based Decoder:
A GCN-based decoder processes the fused features to recover the 3D pose and dense mesh of both hands.
The decoder uses the hand center as a representation to handle hands at arbitrary positions within the field of view.
Comprehensive Experiments:
The proposed method outperforms state-of-the-art approaches on publicly available two-hand datasets, demonstrating the effectiveness of the fusion algorithm.
Ablation studies validate the contributions of different components, such as the depth input, PDFNet, and the GCN-based decoder.
Statisztikák
The absolute position error (MPJPE) of the left hand is 9.64mm and the right hand is 11.62mm.
The relative position error (AL-MPJPE) of the left hand is 6.93mm and the right hand is 8.74mm.
Idézetek
"Accurately recovering the dense 3D mesh of both hands from monocular images poses considerable challenges due to occlusions and projection ambiguity."
"The primary challenge lies in effectively utilizing two different input modalities to mitigate the blurring effects in RGB images and noises in depth images."
"We devise a novel fusion module named PDFNet that effectively harnesses both color information and depth maps."