Conceitos essenciais
Leveraging text-to-image diffusion models, we generate a large-scale dataset of synthetic counterfactual image-text pairs to probe and mitigate intersectional social biases in state-of-the-art vision-language models.
Resumo
The authors present a methodology for automatically generating counterfactual examples to probe and mitigate intersectional social biases in vision-language models (VLMs). They construct a large dataset called SocialCounterfactuals containing over 171,000 image-text pairs that depict various occupations with different combinations of race, gender, and physical characteristics.
Key highlights:
- The authors use text-to-image diffusion models with cross-attention control to generate highly similar counterfactual images that differ only in their depiction of intersectional social attributes.
- They apply a three-stage filtering process to ensure high-quality counterfactual examples are retained in the dataset.
- Evaluations on six state-of-the-art VLMs show significant intersectional biases, with substantial variation in retrieval skewness across different racial and gender attributes.
- Training experiments demonstrate that the SocialCounterfactuals dataset can be effectively used to mitigate intersectional biases in VLMs, with minimal impact on task-specific performance.
- The authors discuss limitations and ethical considerations around their approach and findings.
Estatísticas
"A photo of a White male doctor"
"A photo of a Black female doctor"
"A photo of an Asian male construction worker"
"A photo of a Latino female construction worker"
Citações
"Counterfactual examples, which study the impact on a response variable following a change to a causal feature, have proven valuable in natural language processing (NLP) for probing model biases and improving robustness to spurious correlation."
"Social biases are a particularly concerning type of spurious correlation learned by VLMs. Due to a lack of proportional representation for people of various races, genders, and other social attributes in image-text datasets, VLMs learn biased associations between these attributes and various subjects (e.g., occupations)."