toplogo
Sign In

Detecting Abusive Language and Hate Speech in Code-Switched Political Discussions on Nigerian Twitter


Core Concepts
Developing an abusive language and hate speech dataset, EKOHATE, for political discussions on Nigerian Twitter and evaluating state-of-the-art methods for detecting offensive content.
Abstract
The paper presents the EKOHATE dataset, a new code-switched abusive language and hate speech detection dataset containing 3,398 annotated tweets gathered from the posts and replies of three leading political candidates in Lagos, Nigeria. The dataset is annotated using a binary ("normal" vs "offensive") and a fine-grained four-label annotation scheme ("normal", "abusive", "hateful", and "contempt"). The authors provide an empirical evaluation of state-of-the-art methods across both supervised and cross-lingual transfer learning settings. In the supervised setting, the evaluation results show that the authors can achieve 95.1 F1 points for the binary classification and 70.3 F1 points for the four-label annotation scheme. Furthermore, the authors demonstrate that the EKOHATE dataset generalizes well to three publicly available offensive datasets (OLID, HateUS2020, and FountaHate), achieving 71.8 F1, 62.7 F1, and 53.6 F1 scores respectively. The authors hope that the EKOHATE dataset will encourage the evaluation of hate speech detection methods in diverse countries and languages.
Stats
Nigerians have a notable online presence and actively discuss political and topical matters, particularly during the 2023 general election. The EKOHATE dataset contains 3,398 annotated tweets gathered from the posts and replies of three leading political candidates in Lagos, Nigeria. The dataset exhibits three primary characteristics: it is multilingual, features code-switching, and is inherently noisy due to its social media origin.
Quotes
"Nigerians have a notable online presence and actively discuss political and topical matters. This was particularly evident throughout the 2023 general election, where Twitter was used for campaigning, fact-checking and verification, and even positive and negative discourse." "However, little or none has been done in the detection of abusive language and hate speech in Nigeria."

Deeper Inquiries

How can the EKOHATE dataset be further expanded to include a wider range of political discussions and perspectives in Nigeria?

Expanding the EKOHATE dataset to encompass a broader spectrum of political discussions and perspectives in Nigeria can be achieved through the following strategies: Diversifying Political Candidates: Include tweets from a more extensive range of political candidates across various parties and regions in Nigeria. This will provide a more comprehensive view of political discourse and allow for a more balanced dataset. Incorporating Regional Languages: Integrate tweets in additional Nigerian languages to capture a more diverse set of perspectives and discussions. This will ensure that the dataset is representative of the linguistic diversity in Nigeria. Including Different Election Cycles: Extend the dataset to cover multiple election cycles, including local, state, and national elections. This will offer a longitudinal view of political discussions and enable the analysis of trends over time. Incorporating Social Media Platforms: Expand data collection to include other social media platforms besides Twitter, such as Facebook, Instagram, and online forums. This will provide a more comprehensive understanding of political discourse across various online channels. Collaborating with Local Experts: Partner with local researchers, political analysts, and social media experts to identify key topics, hashtags, and influencers in Nigerian political discussions. Their insights can help in curating a more diverse and relevant dataset.

What are the potential challenges in deploying hate speech detection models built on the EKOHATE dataset in real-world scenarios, and how can these challenges be addressed?

Deploying hate speech detection models built on the EKOHATE dataset in real-world scenarios may face the following challenges: Language Variability: Nigerian Pidgin, Yoruba, and code-switched language in the dataset may pose challenges for models not trained on these languages. Address this by incorporating multilingual models or fine-tuning existing models on diverse language data. Contextual Understanding: Hate speech detection requires understanding cultural nuances and context. Models may struggle with context-specific expressions. Address this by incorporating contextual embeddings or training models on a diverse range of contexts. Imbalanced Data: The dataset may have imbalanced classes, with hateful speech being less prevalent. Techniques like oversampling, undersampling, or using class weights can help address this imbalance. Generalization to New Data: Models trained on the EKOHATE dataset may not generalize well to new, unseen data. Regular model retraining on updated data and continuous evaluation can help maintain model performance. Ethical Considerations: Ensuring that hate speech detection models do not inadvertently suppress legitimate speech or target specific groups unfairly is crucial. Regular bias assessments and human oversight can help mitigate these risks.

How can the insights from the EKOHATE dataset be leveraged to develop more inclusive and equitable political discourse on social media platforms in Nigeria and other African countries?

The insights from the EKOHATE dataset can be leveraged to promote more inclusive and equitable political discourse on social media platforms in Nigeria and other African countries through the following strategies: Education and Awareness: Use the dataset findings to educate social media users about the impact of hate speech and abusive language. Promote awareness campaigns to foster a more respectful online environment. Community Moderation: Implement community moderation strategies based on the dataset insights to flag and address hate speech effectively. Encourage users to report abusive content and provide mechanisms for swift action. Algorithmic Intervention: Develop AI algorithms based on the dataset to automatically detect and filter out hate speech. Collaborate with platform providers to integrate these algorithms into content moderation systems. Dialogue and Engagement: Facilitate constructive dialogue and engagement on social media platforms by highlighting positive discourse and promoting respectful interactions. Encourage political leaders to model inclusive communication. Policy Development: Use dataset insights to inform the development of policies and guidelines for combating hate speech online. Work with regulatory bodies to enforce regulations that promote responsible online behavior. By implementing these strategies, the insights from the EKOHATE dataset can contribute to creating a safer, more inclusive online environment for political discourse in Nigeria and across Africa.
0