المفاهيم الأساسية
Utilizing adversarial examples as counterfactuals to fine-tune computer vision models and mitigate biases without compromising accuracy.
الملخص
The paper proposes a novel approach to mitigate biases in computer vision models by leveraging adversarial examples as counterfactuals. The key insights are:
Current counterfactual generation algorithms often rely on biased generative models, which can introduce additional biases or spurious correlations. To address this, the authors propose using adversarial images, which deceive the model but not humans, as counterfactuals for fair model training.
The authors introduce a curriculum learning-based fine-tuning approach that utilizes these adversarial counterfactuals (referred to as Attribute-Specific Adversarial Counterfactuals or ASACs) to mitigate biases in vision models. The curriculum is designed based on the ability of each ASAC to fool the original model.
Experiments on the CelebA and LFW datasets demonstrate that the proposed approach significantly improves fairness metrics without sacrificing accuracy. Qualitative results indicate the model's ability to disentangle decisions from protected attributes.
Ablation studies reveal the impact of curriculum design on the fairness-accuracy trade-off, highlighting the importance of strategically organizing the training data.
الإحصائيات
Accuracy of the base classifier on the CelebA dataset for Smile, Big Nose, and Wavy Hair is 84.29%, 80.77%, and 81.15% respectively.
Accuracy of the proposed approach with FGSM on the CelebA dataset for Smile, Big Nose, and Wavy Hair is 91.79%, 82.05%, and 82.01% respectively.
Accuracy of the proposed approach with PGD on the CelebA dataset for Smile, Big Nose, and Wavy Hair is 91.20%, 78.80%, and 81.99% respectively.
Accuracy of the proposed approach with FGSM on the LFW dataset for Smile, Bags Under Eyes, and Wavy Hair is 89.19%, 87.29%, and 88.29% respectively.
Accuracy of the proposed approach with PGD on the LFW dataset for Smile, Bags Under Eyes, and Wavy Hair is 88.41%, 86.42%, and 88.70% respectively.
اقتباسات
"Counterfactuals generated by these image generation mechanisms may inherently contain spurious correlations due to the presence of bias in the image generation model."
"Our work advocates for their application as counterfactuals, as a way to improve computer vision models, in fairness metrics as well as accuracy."
"By keeping parts of the image unaltered, the likelihood of introducing spurious correlations (propagated by the generative model during the image generation process) into the fairness mitigation pipeline is markedly reduced."