核心概念
Utilizing spatio-temporal proximity and a dual-path architecture enhances panoramic activity recognition.
要約
The content introduces the Social Proximity-aware Dual-Path Network (SPDP-Net) for Panoramic Activity Recognition (PAR). It addresses challenges in recognizing multi-granular human activities by focusing on spatio-temporal proximity and a dual-path architecture. Extensive experiments validate the effectiveness of SPDP-Net, achieving state-of-the-art performance on the JRDB-PAR dataset.
Introduction
Importance of understanding human activity in videos.
Focus on Human Activity Recognition (HAR), Group Activity Recognition (GAR), and Panoramic Activity Recognition (PAR).
Challenges in PAR
Intricacies of panoramic scenes with diverse activities.
Need for spatio-temporal proximity for accurate social dynamics understanding.
Proposed Method: SPDP-Net
Two stages: proximity-based relation encoding and multi-granular activity recognition.
Utilizes individual-to-global and individual-to-social paths for enhanced contextual understanding.
Experiments
Evaluation on JRDB-PAR dataset with significant performance improvements.
Ablation Studies
Impact of spatio-temporal proximity, positional embedding, and relation matrices on performance enhancement.
Comparison with State-of-the-Arts
Outperforms comparative methods in overall performance, especially in social group activity recognition.
Social Group Detection Comparison
Achieves best results in social group detection compared to other methods.
Visualization of Results
Ground-truth vs predicted relation matrices show the effectiveness of SPDP-Net.
統計
SPDP-Net achieves new state-of-the-art performance with 46.5% F1 score on JRDB-PAR dataset.
引用
"Relying solely on spatial proximity is insufficient; it is imperative to incorporate spatio-temporal proximity in PAR."
"SPDP-Net significantly outperforms the state-of-the-art methods by a large margin."