Core Concepts
The proposed knowledge distillation framework distills discriminative part-level knowledge from heterogeneous high-quality skeletons to enhance representations of low-quality skeletons, enabling accurate action recognition even in the presence of intensive noise.
Abstract
The paper addresses the challenge of skeleton-based action recognition using low-quality skeleton data, which often contains missing or inaccurate joints. The authors propose a general knowledge distillation framework that employs a teacher-student model setup. The teacher model is pre-trained on high-quality skeletons, while the student model handles low-quality skeletons.
To bridge the gap between heterogeneous high-quality and low-quality skeletons, the authors present a novel part-based skeleton matching strategy. This strategy exploits shared body parts to facilitate local action pattern learning. An action-specific part matrix is developed to emphasize critical parts for different actions, enabling the student model to distill discriminative part-level knowledge.
Furthermore, a novel part-level multi-sample contrastive loss is introduced to achieve knowledge transfer from multiple high-quality skeletons to low-quality ones. This allows the proposed framework to include training low-quality skeletons that lack corresponding high-quality matches.
Comprehensive experiments on the NTU-RGB+D, Penn Action, and SYSU 3D HOI datasets demonstrate the effectiveness of the proposed knowledge distillation framework in enhancing action recognition performance using low-quality skeleton data.
Stats
The teacher model trained on high-quality skeletons achieves 84.56% and 89.46% accuracy on the NTU-RGB+D dataset, 90.54% Top1 and 98.97% Top5 accuracy on the Penn Action dataset, and 88.54% and 88.96% accuracy on the SYSU 3D HOI dataset.
The student model trained on low-quality skeletons without knowledge distillation achieves 79.98% and 84.88% accuracy on the NTU-RGB+D dataset, 83.43% Top1 and 97.19% Top5 accuracy on the Penn Action dataset, and 84.33% and 84.00% accuracy on the SYSU 3D HOI dataset.
With the proposed knowledge distillation framework, the student model achieves 83.31% and 88.13% accuracy on the NTU-RGB+D dataset, 87.08% Top1 and 98.50% Top5 accuracy on the Penn Action dataset, and 87.14% and 86.68% accuracy on the SYSU 3D HOI dataset.
Quotes
"The proposed framework employs a teacher-student model setup, where a teacher model trained on high-quality skeletons guides the learning of a student model that handles low-quality skeletons."
"To bridge the gap between heterogeneous high-quality and low-quality skeletons, we present a novel part-based skeleton matching strategy, which exploits shared body parts to facilitate local action pattern learning."
"A novel part-level multi-sample contrastive loss achieves knowledge transfer from multiple high-quality skeletons to low-quality ones, which enables the proposed knowledge distillation framework to include training low-quality skeletons that lack corresponding high-quality matches."