Core Concepts
Efficient framework for expression classification and action unit detection using CLIP and MLP.
Abstract
Human affective behavior analysis is crucial for understanding emotions. This work introduces a lightweight framework combining CLIP image encoder and MLP for expression classification and AU detection. The model integrates CVaR for robustness and loss landscape flattening for improved generalization. Experimental results on Aff-wild2 dataset show superior performance with minimal computational demands. The proposed method outperforms the baseline, offering an efficient solution for affective behavior analysis.
Stats
The Aff-Wild2 dataset consists of 548 videos annotated for six basic expressions, neutral state, and an 'other' category.
Training, validation, and testing sets contain different numbers of videos in both Expression Classification Challenge and Action Unit Detection Challenge.
Our method achieved a 11% improvement in 'macro' F1 score in Expression Classification Challenge compared to the official baseline.
For Action Unit Detection Challenge, our approach enhanced the 'macro' F1 score by 4% over the official baseline.
Quotes
"Our contributions are summarized as follows: We propose the first lightweight efficient framework suitable for expression classification and action unit detection."
"We incorporate CVaR into the loss functions, improving the accuracy and reliability of predictions, especially in challenging scenarios for both tasks."
"Our method outperforms the baseline in both tasks, as demonstrated in experiments on the Aff-wild2 dataset."