Video Action Recognition

Войти

аналитика - Video Action Recognition

ActNetFormer: A Transformer-ResNet Hybrid Approach for Efficient Semi-Supervised Action Recognition in Videos

ActNetFormer leverages both labeled and unlabeled video data to effectively learn robust action representations by combining cross-architecture pseudo-labeling and contrastive learning techniques. The framework integrates 3D Convolutional Neural Networks (3D CNNs) and video transformers (VIT) to comprehensively capture spatial and temporal aspects of actions, achieving state-of-the-art performance in semi-supervised video action recognition tasks.

Language Model Guided Interpretable Video Action Recognition Framework

A novel framework named LaIAR that leverages knowledge from language models to enhance both the recognition capabilities and the interpretability of video models.

Efficient Video Class-Incremental Learning by Slightly Shifting New Classes to Remember Old Classes

SNRO, a novel framework for video class-incremental learning, slightly shifts the features of new classes during their training stage to greatly improve the performance of old classes, while consuming the same memory as existing methods.

Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training Study

UNITE introduces a novel approach to unsupervised video domain adaptation, leveraging masked pre-training and collaborative self-training to achieve significant performance improvements across domains.

О нас

Продукты

Ресурсы