The study delves into the impact of prompt design on Large Language Models (LLMs) for Targeted Sentiment Analysis (TSA) of news headlines. It compares zero-shot and few-shot prompting levels, evaluates predictive accuracy, and quantifies uncertainty in LLM predictions.
Fine-tuned encoder models like BERT show strong TSA performance but require labeled datasets. In contrast, LLMs offer a versatile approach without fine-tuning needs. However, their performance consistency is influenced by prompt design.
The study uses Croatian, English, and Polish datasets to compare LLMs and BERT models. Results show that increased prescriptiveness in prompts improves predictive accuracy but varies by model. LLM uncertainty quantification methods reflect subjectivity but do not align with human inter-annotator agreement.
Overall, the research provides insights into the potential of LLMs for TSA of news headlines and highlights the importance of prompt design in maximizing their performance.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Jana... lúc arxiv.org 03-04-2024
https://arxiv.org/pdf/2403.00418.pdfYêu cầu sâu hơn