Using Large Language Models to extract rationales for hate speech detection enables interpretable classifiers.
This study introduces a new task called Implicit Target Span Identification (iTSI) that aims to detect both explicit and implicit references to target groups in hate speech content. The authors create a novel dataset called Implicit-Target-Span (ITS) by leveraging a pooling-based annotation approach to capture a diverse set of implicit and explicit target spans. They also establish a baseline model, TargetDetect, using sequence tagging techniques to identify target spans in the ITS dataset.
A novel hate speech detection framework, SWE2, that leverages both word-level semantic information and subword knowledge (character-level and phonetic-level) to achieve high performance and robustness against character-level adversarial attacks.
A novel data-augmented, fairness-aware, and uncertainty-estimated framework with Bidirectional Quaternion-Quasi-LSTM layers for effective, robust, and fair hate speech detection.