Khái niệm cốt lõi
Large language models can effectively perform sentiment analysis on Bangla text through zero- and few-shot prompting, though fine-tuned models still outperform them.
Tóm tắt
This study presents a comprehensive evaluation of zero- and few-shot prompting with large language models (LLMs) for Bangla sentiment analysis. The authors developed a new dataset called MUBASE, which contains 33,606 manually annotated Bangla news tweets and Facebook comments.
The key highlights and insights from the study are:
- The authors compared the performance of classical models, fine-tuned models, and LLMs (Flan-T5, GPT-4, BLOOMZ) in both zero-shot and few-shot settings.
- Fine-tuned models, particularly the monolingual BanglaBERT, consistently outperformed the LLMs across various metrics.
- While the LLMs surpassed the random and majority baselines, they fell short compared to the fine-tuned models.
- The performance of the smaller BLOOMZ model (560m) was better than the larger one (1.7B), suggesting the need for more training data to effectively train large models.
- The authors observed little to no performance difference between zero- and few-shot learning with the GPT-4 model, while BLOOMZ yielded better performance in the majority of zero- and few-shot experiments.
- The authors also explored the impact of different prompting strategies, finding that native language instructions achieved comparable performance to English instructions for Bangla sentiment analysis.
- The authors conducted an error analysis, revealing that Flan-T5 struggled to predict the negative class, BLOOMZ failed to label posts as neutral, and GPT-4 had difficulty with the positive class.
Overall, the study provides valuable insights into the effectiveness of LLMs for Bangla sentiment analysis and highlights the continued need for fine-tuned models, especially for low-resource languages.
Thống kê
The dataset contains 33,606 Bangla news tweets and Facebook comments.
The dataset is divided into 23,472 training, 3,427 development, and 6,707 test instances.
The class distribution is: 10,560 positive, 6,197 neutral, and 16,849 negative instances.
Trích dẫn
"Fine-tuned models, particularly the monolingual BanglaBERT, consistently outperformed the LLMs across various metrics."
"While the LLMs surpassed the random and majority baselines, they fell short compared to the fine-tuned models."
"The performance of the smaller BLOOMZ model (560m) was better than the larger one (1.7B), suggesting the need for more training data to effectively train large models."