Core Concepts
The core message of this article is to present the Synthetic Data for Face Recognition (SDFR) competition, which was organized to accelerate research in synthetic data generation for privacy-friendly face recognition models and to bridge the gap between real and synthetic face datasets.
Abstract
The article presents the summary of the Synthetic Data for Face Recognition (SDFR) competition held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024). The competition was organized to investigate the use of synthetic data for training face recognition models and to address the legal, ethical, and privacy concerns associated with large-scale web-crawled face recognition datasets.
The competition was divided into two tasks:
Task 1 (Constrained): Participants were required to use a fixed backbone (iResNet-50) and were limited to a maximum of 1 million synthesized images.
Task 2 (Unconstrained): Participants had complete freedom on the model backbone, dataset, and training pipeline, but only using synthetic data.
The submitted models were evaluated on a diverse set of seven benchmarking datasets, including high-quality unconstrained, cross-pose, cross-age, and challenging mixed-quality datasets. The competition rules were designed to allow exploring ideas for generating privacy-friendly datasets, while preventing the application of large-scale web-crawled datasets.
The article provides a detailed description of the submissions from the participating teams, including the methods used for generating synthetic datasets and training face recognition models. The results show that the submitted models could improve the performance compared to baselines with synthetic datasets, but there is still a significant gap between models trained with synthetic data and models trained with large-scale web-crawled datasets.
The article also discusses submissions with datasets that had a conflict with the competition rules, such as DCFace, SFace, and GANDiffFace, which relied on large-scale web-crawled datasets for training the face generator models. The article highlights the importance of generating synthetic datasets without using large-scale web-crawled datasets to ensure privacy-friendly face recognition models.
Finally, the article presents a further discussion on training face recognition using synthetic data and highlights potential future research directions in the field, such as scaling synthetic datasets, increasing the variations in generated images, and exploring methods that do not rely on pre-trained face recognition models based on large-scale web-crawled datasets.
Stats
The article does not contain any specific sentences with key metrics or important figures. The focus is on the competition setup, submissions, and discussion of the results.
Quotes
The article does not contain any striking quotes supporting the author's key logics.