The study assesses GPT-4's performance in sentence simplification using error-based human evaluation. Results show GPT-4 generates fewer errors but struggles with lexical paraphrasing. Automatic metrics lack sensitivity to evaluate high-quality simplifications by GPT-4.
The research compares GPT-4 and Control-T5 models in sentence simplification, focusing on fluency, meaning preservation, and simplicity. GPT-4 generally outperforms Control-T5 across all dimensions.
An error-based human evaluation framework is designed to identify key failures in important aspects of sentence simplification. The study aims to balance interpretability and consistency in evaluations.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Xuanxin Wu,Y... lúc arxiv.org 03-11-2024
https://arxiv.org/pdf/2403.04963.pdfYêu cầu sâu hơn