Conceitos Básicos
Our approach to the TRAC-2024 Offline Harm Potential Identification tasks leveraged advanced pretrained models and contrastive learning techniques to accurately classify the harm potential of social media content in various Indian languages.
Resumo
The TRAC-2024 challenge focused on evaluating the offline harm potential of online content, with two sub-tasks:
- Sub-task 1a: Classifying the potential of a document to cause offline harm on a 4-tier scale from 'harmless' to 'highly likely to incite harm'.
- Sub-task 1b: Predicting the potential target identities (e.g., gender, religion, political ideology) impacted by the harm.
Our team, NJUST-KMG, participated in sub-task 1a, utilizing a combination of pretrained models, including XLM-R, MuRILBERT, and BanglaBERT, and incorporating contrastive learning techniques to enhance the model's ability to discern subtle nuances in the multilingual dataset.
The key aspects of our approach were:
- Finetuning the pretrained models on the provided dataset to adapt them to the specific task.
- Integrating contrastive learning to improve the model's capacity to differentiate between closely related harm potential categories, addressing the challenge of high intra-class variation and inter-class similarity.
- Employing an ensemble strategy to combine the strengths of diverse models, improving the overall performance and reliability of the system.
Our method achieved an F1 score of 0.73 on sub-task 1a, ranking second among the participants. The incorporation of contrastive learning and the ensemble approach were instrumental in enhancing the model's ability to navigate the linguistic and cultural complexities inherent in the multilingual social media content.
Estatísticas
The dataset for the TRAC-2024 challenge consisted of social media comments in various Indian languages, annotated by expert judges to capture the nuanced implications for offline harm potential.
Citações
"Contrastive learning, by design, operates on the principle of distinguishing between similar and dissimilar pairs of data, effectively 'pushing apart' representations of different categories while 'pulling together' representations of the same category."
"The ensemble strategy employed at the testing phase not only solidifies the individual strengths of diverse models but also ensures our system's resilience and generalization across different data points."