Efficiently capturing skeletal-temporal relations for improved action recognition.
Large language models can be effectively leveraged as powerful action recognizers by projecting skeleton sequences into "action sentences" that are compatible with the models' pre-trained knowledge.
A novel Spiking Graph Convolutional Network (SGN) with multimodal fusion and knowledge distillation is proposed to achieve efficient and accurate skeleton-based action recognition.
The proposed Hybrid Dual-Branch Network (HDBN) effectively combines Graph Convolutional Networks (GCNs) and Transformers to achieve robust and accurate skeleton-based action recognition.
The proposed Improved Graph Pooling Network (IGPN) incorporates a region-aware pooling strategy, cross fusion block, and information supplement module to enhance the representation ability of skeleton features while reducing computational overhead.
The proposed knowledge distillation framework distills discriminative part-level knowledge from heterogeneous high-quality skeletons to enhance representations of low-quality skeletons, enabling accurate action recognition even in the presence of intensive noise.
ReL-SAR, a lightweight convolutional transformer model, leverages self-supervised learning with BYOL to extract robust and generalizable features from skeleton sequences for efficient action recognition.