This paper investigates the performance of conversational language models (LLMs) in debiasing news articles. The authors designed an evaluation checklist from the perspective of news editors, covering criteria like preserving information, context, language fluency, and the author's style.
The authors obtained text generations from three popular conversational models (ChatGPT, GPT4, Llama2) and a fine-tuned T5 model on a subset of a publicly available media bias dataset. Expert news editors from international media organizations then evaluated the model outputs based on the designed checklist.
The results show that while the conversational LLMs are better than the baseline at correcting bias and providing grammatically correct outputs, they have issues preserving information, context, and the author's style. Some models, including ChatGPT, introduced unnecessary changes that may impact the author's style and create misinformation. The authors also found that the models do not perform as proficiently as domain experts in evaluating the quality of debiased outputs.
The authors conclude that employing these tools in a fully automatic news editor can be dangerous, as they can create misinformation. They plan to expand the dataset and investigate advanced methods for automating the evaluation criteria and adapting the models accordingly.
To Another Language
from source content
arxiv.org
ข้อมูลเชิงลึกที่สำคัญจาก
by Ipek Baris S... ที่ arxiv.org 04-10-2024
https://arxiv.org/pdf/2404.06488.pdfสอบถามเพิ่มเติม