The paper introduces Key2Mesh, a model that takes 2D human pose keypoints as input and estimates the corresponding 3D body mesh. The key highlights are:
The model is trained on large-scale motion capture (MoCap) datasets, which provide extensive 3D details but lack corresponding visual data. This overcomes the scarcity of image datasets with 3D labels.
To enable the model's application on RGB images, the authors first run an off-the-shelf 2D pose estimator to obtain the 2D keypoints, and then feed these 2D keypoints to Key2Mesh.
To improve the performance of Key2Mesh on RGB images, the authors apply an adversarial domain adaptation (DA) method to bridge the gap between the MoCap and visual domains. This DA method does not require 3D labels for visual data, enabling adaptation to target sets without the need for costly labels.
Evaluation on the H3.6M and 3DPW datasets shows that Key2Mesh outperforms other models in PA-MPJPE for both datasets, and in MPJPE and PVE for the 3DPW dataset.
Thanks to its simple architecture, Key2Mesh operates at least 12× faster than the prior state-of-the-art model, LGD.
Vers une autre langue
à partir du contenu source
arxiv.org
Questions plus approfondies