The authors developed a comprehensive framework and dataset for analyzing communal violence in Bangla social media content. The framework categorizes communal violence into four identity-based dimensions (Religio-communal, Ethno-communal, Nondenominational Communal, and Noncommunal) and four expressions of violence (Derogation, Antipathy, Prejudication, and Repression).
The authors constructed a dataset of 13,000 Bangla social media comments annotated by a team of experts using this framework. Exploratory data analysis revealed imbalances in the dataset, with Religio-communal violence being the most prevalent.
The authors benchmarked the dataset using a BERT-based model, which achieved reasonable performance on the 4-class classification task but struggled on the more granular 16-class classification. This highlights the computational challenges in accurately detecting the nuanced expressions of communal violence in Bangla text.
The authors discuss the limitations of their work, including the linguistic complexity of Bangla, the subjectivity in annotating subtle forms of violence, and the data imbalance. They emphasize the importance of this research in monitoring and addressing the surge of communal hatred in online Bangla spaces.
Egy másik nyelvre
a forrásanyagból
arxiv.org
Mélyebb kérdések