Conceitos Básicos
The proposed AC2D framework adaptively constrains the self-attention weight distribution and causally deconfounds the sample confounder to improve facial action unit detection performance.
Resumo
The paper presents a novel facial action unit (AU) detection framework called AC2D that addresses two key challenges in AU detection:
-
Adaptively Constraining Self-Attention:
- The authors explore the mechanism of self-attention weight distribution and propose to adaptively constrain it by exploiting prior knowledge about AU locations.
- This allows the self-attention to capture AU-related local information while preserving global relational modeling capacity.
-
Causal Deconfounding of Sample Confounder:
- The authors formulate the causalities among facial image, sample confounder (characteristics), and AU occurrence probability using a causal diagram.
- They then propose a causal intervention module to deconfound the sample confounder for each AU, which helps remove the bias caused by inherent sample characteristics.
The AC2D framework is end-to-end trainable, with the adaptive self-attention constraining and causal deconfounding jointly optimized. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed approach in both constrained and unconstrained scenarios.
Estatísticas
The BP4D dataset contains about 140,000 frames annotated with 12 AUs.
The DISFA dataset contains 4,845 frames annotated with 8 AUs.
The GFT dataset contains about 132,600 frames annotated with 10 AUs.
The BP4D+ dataset contains 197,875 frames annotated with 12 AUs.
The Aff-Wild2 dataset contains about 1,830,000 frames annotated with 12 AUs.
Citações
"To resolve this issue, we propose to constrain the self-attention by exploiting prior knowledge about AU locations."
"To eliminate the effect brought by confounder 𝑍 so that the trained network predicts 𝑌( 𝑗) only based on 𝑋, we block the backdoor path between 𝑍 and 𝑋 via a do-operator."