Systematic Offensive Stereotyping (SOS) Bias in Language Models: Measurement, Validation, and Impact on Hate Speech Detection
Language models exhibit systematic offensive stereotyping (SOS) bias, which is a systematic association between profanity and marginalized identity groups. This SOS bias is reflective of the hate and extremism experienced by these groups online, and it can impact the fairness of downstream tasks like hate speech detection.