This paper introduces a new task of automatically generating lay definitions to simplify complex medical terms into patient-friendly language. The authors created the README dataset, a large collection of over 50,000 unique (medical term, lay definition) pairs and 300,000 mentions, with context-aware lay definitions manually annotated by domain experts.
To improve the quality of the dataset, the authors developed a data-centric Human-AI pipeline called Examiner-Augmenter-Examiner (EAE), which leverages human experts and AI models to filter, augment, and select high-quality data. The authors then used README as training data for models and employed a Retrieval-Augmented Generation (RAG) method to reduce hallucinations and improve the quality of model outputs.
The extensive automatic and human evaluations demonstrate that open-source, mobile-friendly models can achieve or even exceed the performance of state-of-the-art closed-source large language models like ChatGPT when fine-tuned with high-quality data. This research represents a significant step in bridging the knowledge gap in patient education and advancing patient-centric healthcare solutions.
翻譯成其他語言
從原文內容
arxiv.org
從以下內容提煉的關鍵洞見
by Zonghai Yao,... 於 arxiv.org 04-17-2024
https://arxiv.org/pdf/2312.15561.pdf深入探究