The paper proposes HandDiff, a diffusion-based model for 3D hand pose estimation. The key highlights are:
HandDiff takes depth images and point clouds as input and uses a diffusion process to iteratively refine the 3D hand pose.
It introduces a joint-wise condition extraction module to capture individual joint features, and a local feature-conditioned denoiser to leverage detailed observations around each joint.
The denoiser also incorporates a kinematic correspondence-aware aggregation block to model the dependencies between joints, further enhancing the estimation accuracy.
Extensive experiments on four challenging benchmarks, including single-hand datasets (ICVL, MSRA, NYU) and a hand-object interaction dataset (DexYCB), demonstrate that HandDiff outperforms previous state-of-the-art methods by a significant margin.
Ablation studies validate the effectiveness of the proposed components, including the joint-wise conditions, local features, and kinematic correspondence modeling.
The model can achieve state-of-the-art performance with a small number of denoising steps and multiple hypotheses, enabling efficient inference.
Ke Bahasa Lain
dari konten sumber
arxiv.org
Wawasan Utama Disaring Dari
by Wencan Cheng... pada arxiv.org 04-05-2024
https://arxiv.org/pdf/2404.03159.pdfPertanyaan yang Lebih Dalam