Основні поняття
A data-driven approach to enhance the hand region of interest (ROI) estimation in MediaPipe Holistic, leveraging an enriched feature set including additional hand keypoints and the z-dimension, to improve accuracy and robustness across diverse hand orientations.
Анотація
The paper addresses a critical flaw in the hand ROI prediction of MediaPipe Holistic, which struggles with non-ideal hand orientations, affecting the accuracy of downstream applications like sign language recognition. The authors propose a data-driven approach to enhance the ROI estimation by utilizing an enriched feature set, including additional hand keypoints (shoulder, elbow, thumb) and the z-dimension, in addition to the existing wrist, index, and pinky keypoints.
The authors evaluate their approach on the Panoptic Hand DB dataset, comparing the performance of their proposed method against the original MediaPipe Holistic approach. The results demonstrate that the new method achieves better estimates, with higher Intersection-over-Union (IoU) compared to the current method. Specifically, the authors train three separate MLPs to predict the center, size, and angle of the hand ROI, and find that the MLP-based approach outperforms the original heuristic-based method in terms of center and scale prediction, although it struggles with rotation prediction.
The authors also note that their proposed solution, while an improvement over the current methodology, should not be considered the final solution. They encourage users to explore additional optimizations and validate them on larger datasets. The authors have made their code available to facilitate future improvements.
Статистика
The minimum IoU using the original method is 3%, while the new method achieves a minimum of 16% on the test set.