The paper investigates potential biases in large language models (LLMs) by examining their performance on sentiment analysis tasks across different ethnic groups. The authors use several template-based datasets, including the Amazon, NS-Prompts, and Regard datasets, to measure the false positive rate (FPR) gaps for negative and positive sentiment predictions across different ethnicities.
The key findings are:
For most models, the Negative-Sentiment FPR gap for text associated with the Caucasian/White ethnicity is significantly above zero, indicating that the models more often misclassify positive or neutral sentiment examples for this group as negative compared to other groups.
The Positive-Sentiment FPR gap for the Caucasian/White group is statistically significantly negative, combined with the elevated Negative-Sentiment FPR gap. This suggests that the models tend to erroneously view examples from the Caucasian/White ethnicity as negative more often than other groups.
Similar patterns are observed for the African American and Asian groups, but to a smaller extent in the larger variants of the OPT and Llama2 models.
The authors hypothesize that these unexpected findings are not reflective of true biases in the LLMs, but rather an artifact of a mismatch between the structure of the template-based bias datasets, which explicitly mention ethnicity, and the underlying pre-training data of the LLMs, which often do not explicitly state the race/ethnicity due to reporting bias. This discrepancy may lead to the models treating the explicitly mentioned "White" or "Caucasian" text as out-of-domain or eccentric, resulting in the observed performance disparities.
The paper highlights the need to carefully consider the impact of reporting bias in pre-training data when using template-based datasets to measure bias in LLMs, as this can lead to misleading conclusions. The authors suggest exploring alternative approaches, such as using datasets that establish group membership through metadata or classification techniques rather than explicit mention, or incorporating multimodal models that may be less affected by reporting bias.
Para outro idioma
do conteúdo fonte
arxiv.org
Principais Insights Extraídos De
by Farnaz Kohan... às arxiv.org 04-05-2024
https://arxiv.org/pdf/2404.03471.pdfPerguntas Mais Profundas