The study delves into the impact of prompt design on Large Language Models (LLMs) for Targeted Sentiment Analysis (TSA) of news headlines. It compares zero-shot and few-shot prompting levels, evaluates predictive accuracy, and quantifies uncertainty in LLM predictions.
Fine-tuned encoder models like BERT show strong TSA performance but require labeled datasets. In contrast, LLMs offer a versatile approach without fine-tuning needs. However, their performance consistency is influenced by prompt design.
The study uses Croatian, English, and Polish datasets to compare LLMs and BERT models. Results show that increased prescriptiveness in prompts improves predictive accuracy but varies by model. LLM uncertainty quantification methods reflect subjectivity but do not align with human inter-annotator agreement.
Overall, the research provides insights into the potential of LLMs for TSA of news headlines and highlights the importance of prompt design in maximizing their performance.
На другой язык
из исходного контента
arxiv.org
Дополнительные вопросы