Core Concepts
Automated hate speech detectors' conformity to content policies is crucial for transparent and accountable content moderation.
Stats
"A unified taxonomy of harmful content." - Banko et al., 2020
"Facebook specifies 41 community standards guidelines for moderating hate speech." - Facebook, 2022
"Google’s automatic content moderator detected 95% unwanted content before it is seen by a user." - Google, 2023b
Quotes
"Content moderation rules are often uniquely defined, existing hate speech datasets cannot directly answer this question."
"Models generally have high failure rates for non-hateful examples."
"Our dataset highlights the importance of investigating hate speech detectors’ conformity to content policies."