toplogo
Sign In

Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data


Core Concepts
Hate speech detection models evaluated on biased datasets largely overestimate real-world performance on representative Nigerian Twitter data. Domain-adaptive pretraining and finetuning on diverse data are key to maximizing hate speech detection in this low-resource context.
Abstract
The authors introduce NAIJAHATE, the first dataset annotated for hate speech detection (HSD) that contains a representative sample of Nigerian tweets. They demonstrate that HSD models evaluated on biased datasets traditionally used in the literature largely overestimate real-world performance on representative data. The authors propose NAIJAXLM-T, a pretrained language model tailored to the Nigerian Twitter context, and establish the key role played by domain-adaptive pretraining and finetuning in maximizing HSD performance. They find that finetuning on linguistically diverse hateful content sampled through active learning significantly improves performance in real-world conditions relative to a stratified sampling approach. The authors also discuss the cost-recall tradeoff in content moderation and show that having humans review about 1% of all tweets flagged as hateful allows to moderate up to 60% of all hateful content on Nigerian Twitter, highlighting the constraints of a human-in-the-loop approach to content moderation as social media usage continues to grow globally.
Stats
Approximately 0.5% of posts on US Twitter are hateful. The prevalence of hateful content on Nigerian Twitter is around 0.16%. Reviewing 1% of all tweets flagged as hateful by a model allows to moderate 60% of all hateful content on Nigerian Twitter.
Quotes
"To address the global issue of hateful content proliferating in online platforms, hate speech detection (HSD) models are typically developed on datasets collected in the United States, thereby failing to generalize to English dialects from the Majority World." "We demonstrate that HSD evaluated on biased datasets traditionally used in the literature largely overestimates real-world performance on representative data." "We also propose NAIJAXLM-T, a pretrained model tailored to the Nigerian Twitter context, and establish the key role played by domain-adaptive pretraining and finetuning in maximizing HSD performance."

Key Insights Distilled From

by Manuel Tonne... at arxiv.org 03-29-2024

https://arxiv.org/pdf/2403.19260.pdf
NaijaHate

Deeper Inquiries

How can the insights from this study be applied to improve hate speech detection in other low-resource contexts beyond Nigeria?

The insights from this study can be applied to improve hate speech detection in other low-resource contexts by focusing on domain-specific pretraining and finetuning. It is crucial to tailor language models to the linguistic nuances and social contexts of the specific region where hate speech detection is needed. By pretraining models on data from the target region and finetuning them on relevant hate speech datasets, the models can better capture the unique characteristics of hate speech in that context. Additionally, active learning can be utilized to enhance the diversity of training data, ensuring that the models are exposed to a wide range of hate speech examples. This approach can help improve the generalizability and effectiveness of hate speech detection models in low-resource settings.

What are the potential limitations or biases in the active learning approach used to generate diverse training data, and how could these be addressed?

One potential limitation of the active learning approach used to generate diverse training data is the reliance on the initial seed keywords for sampling. This can introduce biases based on the selection of these keywords and may not capture the full spectrum of hate speech present in the data. To address this limitation, a more comprehensive and diverse set of seed keywords can be used to ensure a broader coverage of hate speech topics. Additionally, regular monitoring and adjustment of the active learning process can help mitigate biases and ensure that the training data remains representative of the overall dataset.

Given the constraints of human-in-the-loop moderation, what other complementary approaches could be explored to effectively tackle hate speech at scale on social media platforms?

In addition to human-in-the-loop moderation, several complementary approaches can be explored to effectively tackle hate speech at scale on social media platforms. One approach is the use of semi-supervised learning techniques, where models are trained on a combination of labeled and unlabeled data to leverage the abundance of unlabeled social media content. This can help improve the efficiency and scalability of hate speech detection systems. Furthermore, the integration of user feedback mechanisms and community reporting systems can empower users to flag and report hateful content, aiding in the moderation process. Implementing transparent and robust content moderation policies, along with regular audits and evaluations of moderation systems, can also contribute to more effective hate speech detection and mitigation on social media platforms.
0