The authors present PARAPHRASUS, a multi-faceted benchmark for evaluating paraphrase detection models. The benchmark consists of 10 datasets covering a broad spectrum of paraphrase phenomena, including adversarial, semantic, and lexical variations.
The datasets are divided into three objectives:
The authors evaluate both large language models (LLMs) and a fine-tuned XLM-RoBERTa model on the PARAPHRASUS benchmark. The results reveal that no single model performs consistently well across all aspects of paraphrase detection, highlighting the need for continued system development. The authors also provide insights on the strengths and weaknesses of different prompting strategies for LLMs and the challenges of training efficient paraphrase detection models.
Başka Bir Dile
kaynak içeriğinden
arxiv.org
Önemli Bilgiler Şuradan Elde Edildi
by Andrianos Mi... : arxiv.org 09-19-2024
https://arxiv.org/pdf/2409.12060.pdfDaha Derin Sorular