This paper presents a comprehensive study on sentence readability in the medical domain. The authors introduce a new dataset called MEDREADME, which consists of 4,520 sentences with manually annotated readability ratings and fine-grained complex span annotations. The dataset covers a diverse range of medical resources, including scientific papers, encyclopedia entries, and plain-language summaries.
The analysis reveals that the use of medical jargon, especially "Google-Hard" terms that are difficult for laypeople to understand, has a significant impact on the readability of sentences, even more so than other linguistic features like sentence length and grammatical complexity. The authors also find that the readability of simplified medical texts varies greatly across different resources, suggesting that not all "plain language" versions are equally accessible.
To address this, the authors benchmark and improve several state-of-the-art readability metrics, including unsupervised, supervised, and prompting-based methods. They find that incorporating a single feature capturing the number of jargon spans can significantly boost the performance of existing readability formulas. Additionally, the authors develop a fine-grained complex span identification model that can accurately detect different types of complex terms, including medical jargon, abbreviations, and general complex words.
Overall, this study provides valuable insights into the factors that contribute to the complexity of medical texts and offers practical solutions for improving the readability of such content, which is crucial for enhancing public health literacy.
Vers une autre langue
à partir du contenu source
arxiv.org
Questions plus approfondies