Core Concepts
The core message of this paper is that the gradient magnitudes obtained by backpropagating the difference in Batch Normalization statistics between a test sample and the training dataset distribution can be used as an effective and efficient approach for assessing the utility of face images for automated face recognition systems, without the need for quality labeling or specialized model training.
Abstract
The paper presents a novel approach, called GRAFIQS, for face image quality assessment (FIQA) that leverages the gradient magnitudes obtained during the backpropagation step of a pre-trained face recognition (FR) model. Unlike recent high-performing FIQA approaches that rely on face embeddings, GRAFIQS does not require quality labeling or training of regression networks.
The key idea is to measure the shift in Batch Normalization statistics (BNS), including mean and variance, between the ones recorded during FR training and those obtained by passing test samples through the pre-trained FR model. The authors then backpropagate this difference in BNS through the pre-trained model to generate gradient magnitudes, whose absolute sum serves as the face image quality (FIQ) score.
Through extensive experiments on various benchmarks, the authors demonstrate that their training-free and quality labeling-free approach can achieve competitive results with recent state-of-the-art FIQA methods, without relying on quality labeling, the need to train regression networks, specialized architectures, or designing and optimizing specific loss functions.
The authors first show that using the BNS-based mean squared error (MSEBNS) directly as FIQ can improve face verification performance by discarding low-quality samples. They then demonstrate that utilizing the gradient magnitudes obtained by backpropagating MSEBNS through the pre-trained model leads to significantly better results than using MSEBNS alone.
The authors compare their GRAFIQS approach to various image quality assessment (IQA) methods and state-of-the-art FIQA approaches, and show that GRAFIQS achieves competitive or better performance on benchmarks with large age gaps, pose variations, and quality variations, without the need for specialized training or design choices.
Stats
The mean and standard deviation of Batch Normalization statistics recorded during the training of the face recognition model are integral parts of the pre-trained model parameters.
The mean squared error (MSE) between the Batch Normalization statistics of the training data and those obtained by passing a test sample through the pre-trained model is used as the loss function for backpropagation.
The absolute sum of the gradient magnitudes obtained by backpropagating the MSE loss through the pre-trained model is used as the face image quality (FIQ) score.
Quotes
"Unlike recent high-performing FIQA approaches that rely on face embeddings, our approach does not require quality labeling and training of regression networks."
"We propose to assess the utility of any given test sample by calculating the required changes in the pretrained FR model weights to minimize the difference between the test sample and the model training data distribution."
"We theorize that, given the BNS calculated on a training dataset and the BNS of an input image, high gradient magnitudes resulting from MSE loss indicate a low utility of the input image, and vice versa."