核心概念
Ensemble learning methods improve compound expression recognition by combining local and global features from different models.
統計資料
Due to disparities between the ImageNet dataset and facial expression recognition datasets, we construct a Unity based on single-expression annotations from AffectNet [21] and RAF-DB [19], a total of 306,989 facial images, with 299922 for training, and 7067 for validating.
The ViT model processes extracted facial images and yields 768-dimensional embeddings for each image.
引述
"Facial expressions have significant research value; however, in daily human life, facial expressions are not always singular in nature."
"ViT leads with an accuracy of 78.09%, followed by ResNet with 75.06%, and MANet with 74.06%."