The core message of this paper is that by decomposing the skeleton into fine-grained parts, introducing side information about part-level motion descriptions, and using dual prompts to improve intra-class compactness and inter-class separability, the proposed STAR method can effectively align the skeleton and semantic spaces at a fine-grained level, enabling superior performance in zero-shot and generalized zero-shot skeleton action recognition.
SiT-MLP, a novel MLP-based model, can effectively capture spatial-temporal co-occurrence features for skeleton-based action recognition without relying on elaborate human priors or complex feature aggregation mechanisms.
The proposed MSST-GCN model effectively improves the modeling ability of skeleton-based action recognition by utilizing spatial self-attention with adaptive topology and temporal self-attention, followed by multi-scale convolution networks to capture long-range spatial and temporal dependencies.