Kernekoncepter
PostoMETRO integrates occlusion-resilient 2D pose representation into transformers for robust 3D human mesh recovery.
Resumé
Directory:
Abstract
Introduction
Traditional Approach vs. PostoMETRO
Methodology: Pose Tokenizer, Overall Pipeline, Loss Design
Experiments: Datasets, Implementation Details, Evaluation Metrics
Main Results: Quantitative and Qualitative Results, Training/Inference Time Comparison
Ablation Studies: Effect of Different Tokens, Accuracy of Pose Tokens, Ablation of Mixer Layers, Occlusion Sensitivity Analysis
Abstract:
Recent advancements in single-image-based human mesh recovery have led to interest in enhancing performance under extreme scenarios like occlusion.
PostoMETRO integrates occlusion-resilient 2D pose representation into transformers for more precise 3D coordinate decoding.
Experiments demonstrate the effectiveness of PostoMETRO on standard and occlusion-specific benchmarks.
Introduction:
Challenges in 3D human pose estimation under monocular camera settings include depth ambiguity and occlusion issues.
Existing methods struggle with severe occlusion impacting alignment between human mesh vertices and image pixels.
Traditional Approach vs. PostoMETRO:
Traditional methods convert 2D pose information to estimate 3D pose and mesh, while PostoMETRO uses pose tokens integrated with image tokens for robust integration.
PostoMETRO's approach ensures a rich depiction of texture from images and fosters a robust integration of pose and image information.
Methodology:
Pose tokenizer compresses 2D poses into token sequences using VQ-VAE.
Overall pipeline involves transformer encoders/decoders for message passing between camera token, image tokens, and pose tokens.
Loss design includes penalties for vertex coordinates, joint errors in 3D space, and alignment with ground truth 2D joints.
Experiments:
Utilize datasets like Human3.6M for training and evaluate on benchmarks like 3DPW-OCC to showcase effectiveness under different scenarios.
PyTorch implementation with competitive efficiency during training/inference times compared to other baselines.
Main Results:
Quantitative results show state-of-the-art performance on various datasets including object/person occlusion scenarios.
Qualitative evaluation highlights improved robustness to occluded body parts compared to baseline methods.
Ablation Studies:
Effectiveness of combining image tokens with pose tokens demonstrated through superior performance across different dataset splits.
Ground truth 2D pose tokens significantly enhance model performance in the process of 3D human mesh recovery.
Statistik
PostoMETROは、画像トークンとポーズトークンを組み合わせて、ロバストな3次元人間メッシュの復元を実現します。