Основні поняття
Integrating pose tokens improves 3D human mesh recovery under occlusion scenarios.
Анотація
The article introduces PostoMETRO, a framework for enhancing 3D human mesh recovery by integrating pose tokens. It addresses challenges in single-image-based human mesh recovery, focusing on occlusion scenarios. By condensing 2D pose data into pose tokens and combining them with image tokens in transformers, PostoMETRO achieves robust integration of pose and image information. The method shows effectiveness on various benchmarks, demonstrating improved performance in extreme scenarios like object-occlusion and person-occlusion.
Directory:
-
Abstract
- Recent advancements in single-image-based human mesh recovery.
- Interest in enhancing performance under occlusion.
- Leveraging rich 2D pose annotations for 3D reconstruction.
-
Introduction
- Importance of 3D human pose and shape estimation.
- Challenges in monocular camera settings.
-
Methodology: Pose Tokenizer
- Transforming 2D poses into token sequences using VQ-VAE.
- Training scheme for learning the pose tokenizer.
-
Overall Pipeline
- Utilizing transformers to regress human mesh from single images.
- Encoder-decoder architecture incorporating image and pose tokens.
-
Experimental Results
- Performance comparisons on various datasets including object-occlusion and person-occlusion scenarios.
-
Ablation Studies
- Analysis of different token types (image vs. pose) on model performance.
- Impact of modulator selection (linear vs. mixer).
-
Occlusion Sensitivity Analysis
- Per-joint breakdown of mean 3D error for occluded body parts.
-
Conclusion
Статистика
Experiments show MPVPE of 76.8mm, MPJPE of 67.7mm, PA-MPJPE of 39.8mm with HRNet-W48 backbone on 3DPW-TEST dataset.
Цитати
"In this paper, we present PostoMETRO, a novel paradigm to improve the performance of non-parametric model under occlusion scenarios."
"Our main contributions are summarized as follows: We propose PostoMETRO, a novel framework to incorporate 2D pose into transformers to help 3D human mesh estimation."