Core Concepts
Large language models exhibit substantial gender and racial biases in the news content they generate, with biases manifested at the word, sentence, and document levels. Among the examined models, ChatGPT demonstrates the lowest level of bias, partly due to its reinforcement learning from human feedback feature, but it is also vulnerable to generating highly biased content when provided with biased prompts.
Abstract
The study investigates the gender and racial biases in AI-generated news content produced by seven representative large language models (LLMs), including early models like Grover and recent ones such as ChatGPT, Cohere, and LLaMA.
The researchers collected 8,629 news articles from The New York Times and Reuters, two highly ranked news agencies known for their dedication to accurate and unbiased reporting. They then applied each LLM to generate news content using the headlines of these articles as prompts, and evaluated the biases in the generated content at the word, sentence, and document levels.
At the word level, the AIGC (AI-generated content) produced by each LLM exhibited substantial deviations from the reference news articles in terms of the distribution of gender- and race-related words. ChatGPT demonstrated the lowest gender and racial biases at this level, partly due to its reinforcement learning from human feedback (RLHF) feature. However, the AIGC generated by ChatGPT also showed a higher degree of bias when provided with biased prompts, highlighting its vulnerability to malicious exploitation.
At the sentence level, the AIGC exhibited biases in the expressed sentiments and toxicities towards different gender and racial groups, with ChatGPT again performing the best in mitigating these biases. Similar patterns were observed at the document level, where the AIGC showed significant biases in the conveyed semantics and themes related to gender and race, with ChatGPT being the top performer.
Overall, the study reveals that the AIGC produced by the examined LLMs deviates substantially from the reference news articles in terms of word choices, expressed sentiments and toxicities, and conveyed semantics related to gender and race. The findings highlight the importance of understanding and addressing the limitations of LLMs to harness their full potential.
Stats
The percentage of female specific words in AIGC is on average 24.50% to 43.38% lower than in the reference news articles.
The percentage of Black-race specific words in AIGC is on average 30.39% to 48.64% lower than in the reference news articles.
The percentage of female pertinent topics in AIGC is on average 26.67% to 43.80% lower than in the reference news articles.
The percentage of Black-race pertinent topics in AIGC is on average 31.94% to 48.64% lower than in the reference news articles.
Quotes
"LLMs are trained on archival data produced by humans. Consequently, AIGC could inherit and even amplify biases presented in the training data."
"To harness the potential of LLMs, it is imperative to examine the bias of AIGC produced by them."
"ChatGPT demonstrates the lowest level of bias, which is partly attributed to its reinforcement learning from human feedback (RLHF) feature."
"When a biased prompt bypasses ChatGPT's screening process, it produces a significantly more biased news article in response to the prompt."