The study focuses on the problem of detecting fake reviews in the Bengali language, which is an under-explored research area. The key highlights are:
Creation of the BFRD dataset: The authors collected 9,049 food-related reviews in Bengali from social media platforms, of which 1,339 were annotated as fake and 7,710 as non-fake by expert annotators. This is the first publicly available dataset for Bengali fake review detection.
Text conversion pipeline: The authors developed a unique pipeline that translates English words to their Bengali equivalents and back-transliterates Romanized Bengali to Bengali, to handle the code-mixed nature of the reviews.
Text augmentation: To address the class imbalance problem, the authors utilized text augmentation techniques such as token replacement, back-translation, and paraphrasing to increase the number of fake review instances.
Ensemble model: The authors proposed a weighted ensemble model that combines four pre-trained Bengali language models: BanglaBERT Base, BanglaBERT, BanglaBERT Large, and BanglaBERT Generator. This ensemble approach outperformed individual models and other deep learning techniques.
Extensive experimentation and analysis: The authors conducted rigorous experiments to compare the performance of various deep learning and transformer-based models. They also employed the LIME text explainer framework to provide explanations for the model's predictions and analyzed the misclassification categories.
The proposed ensemble model achieved a weighted F1-score of 0.9843 on the BFRD dataset, demonstrating its effectiveness in detecting fake Bengali reviews.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by G. M. Shahar... alle arxiv.org 05-07-2024
https://arxiv.org/pdf/2308.01987.pdfDomande più approfondite