Khái niệm cốt lõi
A lightweight and efficient multi-modal fusion framework that leverages positional decoupling, graph-embedded Transformer, and ego-vehicle motion to accurately predict pedestrian crossing intentions.
Tóm tắt
The paper presents GTransPDM, a Graph-embedded Transformer with a Position Decoupling Module (PDM) for pedestrian crossing intention prediction (PCIP). The key highlights are:
-
Positional Encoder:
- The PDM was introduced to decompose the pedestrian's lateral movements and simulate depth variations in the image view, addressing the positional distortion issue in on-board camera views.
- The PDM combines the pedestrian's position relative to the road boundary, displacement, and instantaneous velocity to represent the true movement patterns.
-
Pose Encoder:
- A GCN-based encoder with learnable edge importance was designed to capture the spatial-temporal dynamics of human pose skeletons, integrating essential factors such as position, skeleton, and ego-vehicle motion.
-
Fusion and Temporal Modeling:
- The multi-modal features from the position, pose, and ego-vehicle motion encoders were fused and passed through a Transformer encoder for temporal modeling.
- The lightweight network architecture achieved superior performance, reaching 92% accuracy on the PIE dataset and 87% accuracy on the JAAD dataset, with a processing speed of 0.05ms.
The proposed GTransPDM outperforms state-of-the-art methods in PCIP, demonstrating the effectiveness of the positional decoupling and graph-embedded Transformer components in accurately predicting pedestrian crossing intentions.
Thống kê
The area ratio R was employed to combine with the position variations in PDM to mimic the depth shift in image view.
The acceleration of the ego-vehicle was calculated as: acct = (sT - s0) × FPS / (3.6 × T), where FPS is the frames per second and T is the observation length.
Trích dẫn
"To gain a deeper insight into pedestrian crossing behavior, various factors have been explored in existing approaches, showcasing improved performance by incorporating environmental context [11]–[13]. However, these methods may lead to errors in image processing, instance mask inference, or become computationally intensive."
"Pedestrian trajectories provide valuable insights into pedestrian behavior and have been extensively utilized in PCIP [14]–[17]. Yet, since pedestrian positions in the image are relative to the ego-vehicle, this can hinder the models' ability to accurately identify pedestrian movements in real-world scenarios."