The paper introduces FAIntbench, a comprehensive benchmark for evaluating biases in text-to-image (T2I) generation models. It establishes a specific 4-dimension bias definition system for T2I models, allowing precise bias classification. The benchmark includes a dataset of 2,654 prompts covering occupations, characteristics, and social relations, as well as protected attributes like gender, race, and age.
FAIntbench employs fully automated evaluations based on CLIP alignment, featuring adjustable evaluation metrics. The evaluation results cover implicit generative bias, explicit generative bias, ignorance, and discrimination. The authors evaluate seven recent large-scale T2I models using FAIntbench and conduct human evaluations to validate the effectiveness of the benchmark.
The results reveal that the evaluated models perform well in gender biases but have considerable race biases. The authors also find that distillation may increase model biases, suggesting the need for further research. FAIntbench aims to promote a fairer AI-generated content community by providing a robust benchmark for bias evaluation and encouraging continuous improvement of T2I models.
إلى لغة أخرى
من محتوى المصدر
arxiv.org
الرؤى الأساسية المستخلصة من
by Hanjun Luo, ... في arxiv.org 09-19-2024
https://arxiv.org/pdf/2405.17814.pdfاستفسارات أعمق