The study assesses GPT-4's performance in sentence simplification using error-based human evaluation. Results show GPT-4 generates fewer errors but struggles with lexical paraphrasing. Automatic metrics lack sensitivity to evaluate high-quality simplifications by GPT-4.
The research compares GPT-4 and Control-T5 models in sentence simplification, focusing on fluency, meaning preservation, and simplicity. GPT-4 generally outperforms Control-T5 across all dimensions.
An error-based human evaluation framework is designed to identify key failures in important aspects of sentence simplification. The study aims to balance interpretability and consistency in evaluations.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Xuanxin Wu,Y... alle arxiv.org 03-11-2024
https://arxiv.org/pdf/2403.04963.pdfDomande più approfondite