The paper proposes a Hierarchical Space-Time Attention (HSTA) method for micro-expression recognition (MER). The key insights are:
Unimodal Space-Time Attention (USTA): This module captures the temporal relationships between subtle facial movements and specific facial regions by processing video frames through a cascaded self-attention mechanism.
Crossmodal Space-Time Attention (CSTA): This module fuses information from different modalities (e.g., video frames and special frames/optical flow) while maintaining the uniqueness of each modality. It uses a symmetrical cross-attention structure to integrate the contents.
Hierarchical Learning: The authors extend the USTA and CSTA into a hierarchical structure (HSTA) to effectively capture deeper facial cues and motion patterns for improved micro-expression recognition.
The experiments on four benchmark datasets demonstrate the effectiveness of the proposed HSTA approach, outperforming state-of-the-art methods, especially on the large-scale CASME3 dataset. The authors also explore the use of additional data like macro-expressions and objective classes, further enhancing the performance.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Haihong Hao,... at arxiv.org 05-07-2024
https://arxiv.org/pdf/2405.03202.pdfDeeper Inquiries