Core Concepts
The importance of detecting fake news in Urdu is addressed through the creation of a benchmark dataset, "Ax-to-Grind Urdu," to bridge existing gaps and limitations.
Abstract
Abstract:
Misinformation's impact on society.
Lack of regional language fact-checking portals.
Introduction of "Ax-to-Grind Urdu" dataset.
Introduction:
Significance of Fake News Detection (FND).
Examples of FN impact globally.
Importance of FND in the digital era.
Data Extraction:
"The dataset contains news items in Urdu from the year 2017 to the year 2023."
"F1-score of 0.924, accuracy of 0.956, precision of 0.942, recall of 0.940 and an MCC value of 0.902."
Related Work:
Overview of previous datasets and techniques used for Urdu FND.
Performance metrics comparison with existing models.
Ax-to-Grind Dataset:
Dataset Collection and Annotation:
Collection sources for true and fake news.
Removal of meaningless words and symbols from raw data.
Corpus Statistics:
Unique words: 29,911.
Average words per news item: True - 34.82, Fake - 116.98, Combined - 75.90.
Dataset Pre-processing:
Techniques used for cleaning data before model input.
Methodology for Baseline Transformer:
Lexical Feature Extraction:
Explanation of TF-IDF technique for feature extraction.
NLP Pre-trained Transformer-based Models:
Description and selection criteria for mBERT, XLNet, XLM-RoBERTa models.
Ensembling the Pre-trained Models:
Stacking method used to enhance model performance.
Experimental Evaluation:
Performance Evaluation:
Results comparison with ML and DL models.
McNemar’s Test:
Statistical significance evaluation using McNemar's test.
Conclusion:
Summary highlighting dataset creation, model performance, and statistical significance validation.
Stats
"The dataset contains news items in Urdu from the year 2017 to the year 2023."
"F1-score of 0.924, accuracy of 0.956, precision of 0.942, recall of 0.940 and an MCC value of 0.902."
Quotes
"No manual validation was performed with a limited scope."
"The proposed ensemble model shows an F1-score of 0.924."