Kernekoncepter
NLP techniques aid in detecting misogyny in code-mixed Hinglish comments.
Resumé
In this study, the authors focus on exploring misogynistic comments in code-mixed Hinglish from YouTube videos. They highlight the rise of online hate speech and cyberbullying, particularly affecting women. The lack of studies addressing misogyny detection in under-resourced languages is emphasized. A novel dataset of YouTube comments labeled as 'Misogynistic' and 'Non-misogynistic' is presented for analysis. Exploratory Data Analysis (EDA) techniques are applied to gain insights into sentiment scores, word patterns, and more. The paper discusses the motivation behind the study, hypothesis, literature review on misogyny detection and code-mixed languages, dataset details, EDA findings, PCA results with distinct clusters identified, research questions answered through EDA insights, and concludes by outlining future steps for machine learning model training.
Statistik
Women are disproportionately more likely to be victims of online abuse.
The dataset consists of 2,229 YouTube comments labeled as 'Misogynistic' (181) and 'Non-misogynistic' (2,048).
Misogynistic comments are generally longer than non-misogynistic ones.
The average number of characters per comment is 115.22.
Most comments show slightly positive sentiment scores using TextBlob.
Citater
"Platforms employ techniques such as keyword filtering and manual content moderation to remove hateful and offensive content."
"Users from multicultural countries combine their local languages with English in online posts."
"Hate speech detection is crucial while keeping the context of the conversation in mind."