Core Concepts
SGHateCheck is a novel framework designed to comprehensively evaluate hate speech detection models for the linguistic and cultural context of Singapore and Southeast Asia, exposing critical flaws in state-of-the-art models and highlighting the need for more effective tools in diverse linguistic environments.
Abstract
The paper introduces SGHateCheck, a framework for evaluating hate speech (HS) detection models in the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and Multilingual HateCheck (MHC) by employing large language models for translation and paraphrasing into Singapore's main languages (English, Mandarin, Tamil, and Malay), and refining these with native annotators.
SGHateCheck comprises over 21,000 test cases across the four languages, with 11,373 annotated cases. It evaluates models on various functionalities, including:
Distinct expressions of hate (derogatory remarks, threats, slurs, profanity)
Contrastive non-hate content (profanity, negation, references to protected groups without malice)
Counter-speech scenarios
Targeting of non-protected entities
The authors benchmark several state-of-the-art open-source language models (mBERT, LLaMA2, Mistral, SEA-LION, SeaLLM) fine-tuned on existing hate speech datasets. The results reveal critical limitations:
Weaker models predominantly misclassify test cases as non-hateful.
Multilingual dataset fine-tuning provides modest performance gains.
Compared to MHC and HateCheck, the models underperform on selected SGHateCheck functionalities, even in languages like English and Mandarin.
These shortcomings could have severe implications for content moderation, risking harm by inadequately protecting users against hate speech or unnecessarily limiting free expression. By exposing these flaws, SGHateCheck aims to drive the development of more robust hate speech detection models, particularly for the Singapore and Southeast Asian context.
Stats
"To address the limitations of current hate speech detection models, we introduce SGHateCheck, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia."
"SGHateCheck comprises 28 functional tests for Singlish, 26 for Mandarin, and 21 each for Malay and Tamil."
"In total, across four languages, SGHateCheck comprises 21,152 test cases, with 15,052 classified as hateful and 6,100 as non-hateful."
Quotes
"SGHateCheck reveals critical flaws in state-of-the-art models, highlighting their inadequacy in sensitive content moderation."
"Such shortcomings could have severe implications if these LLMs were deployed for content moderation, risking harm by inadequately protecting users against HS or unnecessarily limiting free expression."