The study addresses the issue of misalignment between predicted meshes and image evidence in 3D human pose estimation. By learning dense correspondences between initial model estimates and images, per-pixel displacements are used for refinement. Renderings of 3D models predict displacements between synthetic renderings and RGB images, integrating appearance information effectively. The approach refines initial mesh predictions by minimizing reprojection loss, demonstrating improved image-model alignment and 3D accuracy. Various methods focusing on refining regressed human mesh predictions have been introduced, emphasizing the importance of accurate estimates in various applications. The study introduces a method to learn per-pixel correspondences for refining estimated 3D human meshes in realistic scenarios, leveraging appearance information and depth renderings. By utilizing initial mesh estimates from regression-based pose estimators, the network learns small displacements while adapting to prediction errors efficiently.
إلى لغة أخرى
من محتوى المصدر
arxiv.org
الرؤى الأساسية المستخلصة من
by Tom Wehrbein... في arxiv.org 03-19-2024
https://arxiv.org/pdf/2403.11634.pdfاستفسارات أعمق