核心概念
This study explores various deep learning models to enhance human action recognition in video data, emphasizing the importance of integrating spatial and temporal information for optimal performance.
摘要
Human action recognition in videos is crucial for applications like surveillance and sports analytics. This study evaluates CNNs, RNNs, and Two-Stream ConvNets on UCF101 Videos dataset, highlighting the superior performance of Two-Stream ConvNets in capturing spatial and temporal dimensions. The research suggests composite models can enhance human action recognition efficiency.
統計資料
CNNs achieved an accuracy of 88% with precision, recall, and F1 scores of 85.20%, 88.15%, and 85.56%.
RNNs had an accuracy of 9% with precision, recall, and F1 scores of 5.91%, 9%, and 6.30%.
Two-Stream ConvNets reached an accuracy of 93.30% with precision, recall, and F1 scores of 93.55%, 93.30%, and 93.32%.
引述
"The results underscore the potential of composite models in achieving robust human action recognition."
"Two-Stream ConvNets exhibit superior performance by integrating spatial and temporal dimensions."