toplogo
Sign In

Comprehensive Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore


Core Concepts
SGHateCheck is a novel framework designed to comprehensively evaluate hate speech detection models for the linguistic and cultural context of Singapore and Southeast Asia, exposing critical flaws in state-of-the-art models and highlighting the need for more effective tools in diverse linguistic environments.
Abstract
The paper introduces SGHateCheck, a framework for evaluating hate speech (HS) detection models in the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and Multilingual HateCheck (MHC) by employing large language models for translation and paraphrasing into Singapore's main languages (English, Mandarin, Tamil, and Malay), and refining these with native annotators. SGHateCheck comprises over 21,000 test cases across the four languages, with 11,373 annotated cases. It evaluates models on various functionalities, including: Distinct expressions of hate (derogatory remarks, threats, slurs, profanity) Contrastive non-hate content (profanity, negation, references to protected groups without malice) Counter-speech scenarios Targeting of non-protected entities The authors benchmark several state-of-the-art open-source language models (mBERT, LLaMA2, Mistral, SEA-LION, SeaLLM) fine-tuned on existing hate speech datasets. The results reveal critical limitations: Weaker models predominantly misclassify test cases as non-hateful. Multilingual dataset fine-tuning provides modest performance gains. Compared to MHC and HateCheck, the models underperform on selected SGHateCheck functionalities, even in languages like English and Mandarin. These shortcomings could have severe implications for content moderation, risking harm by inadequately protecting users against hate speech or unnecessarily limiting free expression. By exposing these flaws, SGHateCheck aims to drive the development of more robust hate speech detection models, particularly for the Singapore and Southeast Asian context.
Stats
"To address the limitations of current hate speech detection models, we introduce SGHateCheck, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia." "SGHateCheck comprises 28 functional tests for Singlish, 26 for Mandarin, and 21 each for Malay and Tamil." "In total, across four languages, SGHateCheck comprises 21,152 test cases, with 15,052 classified as hateful and 6,100 as non-hateful."
Quotes
"SGHateCheck reveals critical flaws in state-of-the-art models, highlighting their inadequacy in sensitive content moderation." "Such shortcomings could have severe implications if these LLMs were deployed for content moderation, risking harm by inadequately protecting users against HS or unnecessarily limiting free expression."

Deeper Inquiries

How can the template-based generation of test cases in SGHateCheck be further improved to better capture the nuances of hate speech in multilingual, code-switching contexts?

In order to enhance the template-based generation of test cases in SGHateCheck to better capture the nuances of hate speech in multilingual, code-switching contexts, several strategies can be implemented: Dynamic Templates: Instead of fixed template-placeholder pairs, the templates could be made more dynamic to adapt to the linguistic variations and code-switching commonly found in multilingual contexts. This flexibility would allow for a more accurate representation of the diverse ways hate speech can manifest across languages. Language-Specific Templates: Developing language-specific templates that account for the unique characteristics of each language and dialect would improve the accuracy of the test cases. This approach would involve working closely with language experts and native speakers to create templates that resonate with the nuances of each language. Incorporating Slang and Idioms: Including slang terms, idiomatic expressions, and culturally specific language elements in the templates would help capture the subtleties of hate speech in code-switching contexts. This would require collaboration with experts familiar with the local linguistic nuances. Contextual Templates: Introducing templates that consider the context in which hate speech occurs, such as social media interactions, online forums, or specific cultural settings, would provide a more realistic representation of hate speech scenarios. Understanding the context is crucial for accurately detecting hate speech. Continuous Iteration and Feedback: Regularly updating and refining the templates based on feedback from language experts, native speakers, and real-world data would ensure that the test cases remain relevant and reflective of evolving language use in multilingual environments. By implementing these strategies, SGHateCheck can improve the template-based generation of test cases to better capture the complexities of hate speech in multilingual, code-switching contexts.

How can the SGHateCheck framework be extended to incorporate real-world contextual information and the full spectrum of protected groups to provide a more comprehensive evaluation of hate speech detection models?

To extend the SGHateCheck framework for a more comprehensive evaluation of hate speech detection models, the following steps can be taken: Real-World Data Integration: Incorporate real-world data sources, such as social media platforms, online forums, and news articles, to gather a diverse range of hate speech examples. This data can provide insights into the contextual use of hate speech and help in creating more realistic test cases. Community Engagement: Engage with community stakeholders, advocacy groups, and individuals from diverse backgrounds to understand the specific hate speech challenges they face. This input can guide the selection of protected groups and common slurs, ensuring a more inclusive evaluation framework. Expanded Protected Groups: Include a broader range of protected groups beyond the traditional categories to encompass intersectional identities and marginalized communities. This expansion would provide a more comprehensive evaluation of hate speech detection models across diverse demographics. Contextual Scenarios: Develop test cases that simulate real-world hate speech scenarios, taking into account the social, cultural, and political contexts in which hate speech occurs. This approach would provide a more nuanced evaluation of models' performance in detecting hate speech in varied settings. Bias Detection Mechanisms: Implement mechanisms to detect and mitigate biases in the selection of protected groups and common slurs. This could involve regular audits, diversity training for annotators, and bias-aware model evaluation techniques to ensure a fair and accurate evaluation process. Continuous Improvement: Continuously update the SGHateCheck framework based on feedback from experts, stakeholders, and model performance evaluations. This iterative approach will ensure that the framework remains relevant and effective in evaluating hate speech detection models. By incorporating real-world contextual information, expanding the spectrum of protected groups, and adopting a proactive approach to bias detection, SGHateCheck can provide a more comprehensive and robust evaluation of hate speech detection models.

What are the potential biases and limitations in the expert-guided selection of protected groups and common slurs used to construct the SGHateCheck test cases?

The expert-guided selection of protected groups and common slurs in SGHateCheck test cases may introduce biases and limitations that could impact the evaluation process. Some potential biases and limitations include: Expert Subjectivity: The selection of protected groups and common slurs is subjective and may vary based on the expertise and perspectives of the experts involved. This subjectivity could lead to the omission of certain groups or slurs that are relevant but not identified by the experts. Underrepresentation: Experts may unintentionally underrepresent certain marginalized or less visible groups when selecting protected groups. This could result in an incomplete evaluation of hate speech targeting these groups. Cultural Biases: Experts' cultural backgrounds and biases may influence the selection of protected groups and common slurs, potentially overlooking nuances specific to certain cultural contexts. This could limit the framework's effectiveness in capturing the full spectrum of hate speech. Limited Diversity: The lack of diversity among the experts involved in selecting protected groups and common slurs could lead to a narrow representation of identities and language variations. This limitation may hinder the framework's ability to detect hate speech across diverse linguistic environments. Inherent Assumptions: The selection process may be based on assumptions about which groups are most vulnerable to hate speech, potentially overlooking emerging forms of hate speech targeting new or evolving identities. This could result in a biased evaluation of hate speech detection models. Linguistic Challenges: Experts may face challenges in identifying common slurs and linguistic nuances across multiple languages, leading to inconsistencies in the selection process. This could impact the accuracy and relevance of the test cases generated. Addressing these biases and limitations requires a careful and inclusive selection process, ongoing diversity training for experts, and regular reviews to ensure a comprehensive and unbiased representation of protected groups and common slurs in SGHateCheck test cases.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star