Belangrijkste concepten
FAIntbench provides a holistic and precise benchmark for evaluating various types of biases in text-to-image generation models, along with a specific bias definition system and a comprehensive dataset.
Samenvatting
The paper introduces FAIntbench, a comprehensive benchmark for evaluating biases in text-to-image (T2I) generation models. It establishes a specific 4-dimension bias definition system for T2I models, allowing precise bias classification. The benchmark includes a dataset of 2,654 prompts covering occupations, characteristics, and social relations, as well as protected attributes like gender, race, and age.
FAIntbench employs fully automated evaluations based on CLIP alignment, featuring adjustable evaluation metrics. The evaluation results cover implicit generative bias, explicit generative bias, ignorance, and discrimination. The authors evaluate seven recent large-scale T2I models using FAIntbench and conduct human evaluations to validate the effectiveness of the benchmark.
The results reveal that the evaluated models perform well in gender biases but have considerable race biases. The authors also find that distillation may increase model biases, suggesting the need for further research. FAIntbench aims to promote a fairer AI-generated content community by providing a robust benchmark for bias evaluation and encouraging continuous improvement of T2I models.
Statistieken
The proportion of white middle-aged men depicted in T2I model outputs is significantly higher than the actual demographic statistics.
The performance of the evaluated models is best in gender bias and worst in race bias.
Distillation of T2I models may increase their biases.
Citaten
"FAIntbench provides a holistic and precise benchmark for various types of biases in T2I models, along with a specific bias definition system and a comprehensive dataset."
"The results reveal that the evaluated models perform well in gender biases but have considerable race biases."
"The authors also find that distillation may increase model biases, suggesting the need for further research."