核心概念
Gait recognition can be improved by fusing silhouette and skeleton representations, and refining the skeleton data using temporal consistency from silhouettes.
要約
The paper proposes GaitSTR, a method for gait recognition that combines silhouette and skeleton representations. The key insights are:
-
Silhouettes lack detailed part information when there is overlap between body segments, and are affected by carried objects and clothing. Skeletons provide more accurate part information but are sensitive to occlusions and low-quality images, causing inconsistencies in frame-wise results.
-
GaitSTR refines the skeleton representation by leveraging the temporal consistency between silhouettes and skeletons. It introduces two-level fusion: internal fusion within skeletons (between joints and bones) and cross-modal correction with temporal guidance from silhouettes.
-
The internal fusion uses self-correction residual blocks to improve consistency between joints and bones in the skeleton representation. The cross-modal fusion uses silhouette features to predict relative changes for joints and bones, refining the skeleton.
-
Experiments on four public gait recognition datasets show that the refined skeletons, when combined with silhouettes, outperform other state-of-the-art methods that use skeletons and silhouettes.
統計
Gait recognition can achieve 98.4% rank-1 accuracy on the CASIA-B dataset, a 23.8% relative reduction in error rate compared to the best 2D convolution-based method.
Gait recognition can achieve 90.8% rank-1 accuracy on the OUMVLP dataset, a small improvement over the previous state-of-the-art.
Gait recognition can achieve 65.1% rank-1 accuracy on the Gait3D dataset, a 3.1% improvement over the previous state-of-the-art.
Gait recognition can achieve 89.6% rank-1 accuracy on the GREW dataset, a 3.9% improvement over the previous state-of-the-art.
引用
"Silhouettes suffer from variations due to clothing and carried objects, as shown in Figure 1 (a), introducing external ambiguity, with segmented parts of a binarized silhouette being unavailable."
"Skeletons, on the other hand, include inconsistencies across frames in a sequence due to erroneous joint predictions, as depicted in Figure 1 (b), thereby reducing the accuracy of gait recognition."
"We enhance the quality of skeletons by employing silhouettes to rectify the jitters while retaining necessary identity information for more accurate gait recognition."