Core Concepts
SiT-MLP, a novel MLP-based model, can effectively capture spatial-temporal co-occurrence features for skeleton-based action recognition without relying on elaborate human priors or complex feature aggregation mechanisms.
Abstract
The paper proposes a novel Spatial Topology Gating Unit (STGU) as the core component of the SiT-MLP model for skeleton-based action recognition. The key highlights are:
STGU is an MLP-based structure that can capture point-wise sample-specific topology features without using any human priors. It introduces a new gate-based feature interaction mechanism to activate features point-to-point based on the generated attention map.
SiT-MLP, the first MLP-based model for skeleton-based action recognition, is built upon the STGU. It achieves competitive performance compared to previous GCN-based and Transformer-based methods on three large-scale datasets, while significantly reducing the number of parameters and computational resources.
Extensive experiments and ablation studies demonstrate the effectiveness of the individual components in SiT-MLP, such as the sample-specific and sample-generic aggregation modules, as well as the temporal-wise and channel-wise topology modeling.
SiT-MLP shows greater generalization capability compared to GCN-based methods, as it can maintain relatively small performance drops when tested on skeletons extracted from RGB videos in complex real-world environments.
Stats
The paper does not provide any specific numerical data or statistics to support the key logics. The focus is on the model architecture and its effectiveness compared to previous methods.
Quotes
There are no striking quotes from the content that directly support the author's key logics.