Facial Action Units (AU) detection is enhanced by the innovative PETL paradigm, MoKE collaboration mechanism, and MDWA-Loss in AUFormer.
Utilizing temporal convolution and GPT-2 enhances AU detection accuracy by integrating audio-visual data for nuanced emotional expression understanding.
AUFormer introduces a Parameter-Efficient Transfer Learning paradigm for efficient AU detection, showcasing state-of-the-art performance without relying on additional data.
A novel contrastive learning framework that incorporates both self-supervised and supervised signals to enhance the learning of discriminative features for accurate facial action unit detection, addressing challenges such as class imbalance and noisy labels.
The proposed AC2D framework adaptively constrains the self-attention weight distribution and causally deconfounds the sample confounder to improve facial action unit detection performance.