The study assesses GPT-4's performance in sentence simplification using error-based human evaluation. Results show GPT-4 generates fewer errors but struggles with lexical paraphrasing. Automatic metrics lack sensitivity to evaluate high-quality simplifications by GPT-4.
The research compares GPT-4 and Control-T5 models in sentence simplification, focusing on fluency, meaning preservation, and simplicity. GPT-4 generally outperforms Control-T5 across all dimensions.
An error-based human evaluation framework is designed to identify key failures in important aspects of sentence simplification. The study aims to balance interpretability and consistency in evaluations.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Xuanxin Wu,Y... a las arxiv.org 03-11-2024
https://arxiv.org/pdf/2403.04963.pdfConsultas más profundas