المفاهيم الأساسية
Introducing an Automated Interactive Evaluation framework with a State-Aware Patient Simulator to enhance medical Large Language Models (LLMs) in clinical scenarios.
الملخص
Large Language Models (LLMs) excel in human interactions but lack application in healthcare. The AIE framework and SAPS bridge the gap by simulating doctor-patient interactions realistically. AIE offers dynamic evaluations through multi-turn simulations, enhancing LLM capabilities in clinical tasks. SAPS includes a state tracker, memory bank, and response generator for patient-doctor interactions. Experimental validation shows AIE's effectiveness aligning with human evaluations, revolutionizing medical LLM testing.
الإحصائيات
Large Language Models have significantly advanced various fields.
Previous works focus on medical knowledge performance through examinations.
AIE and SAPS provide a realistic platform for assessing LLMs through doctor-patient simulations.
Empirical validation demonstrates the effectiveness of the AIE framework.
SAPS includes a state tracker, memory bank, and response generator for patient-doctor interactions.
Results show superior stability and correlation with human behavior in SAPS simulations.
Comparative evaluation highlights differences between closed-source and open-source models.
Automatic metrics reveal distinctions in diagnostic capabilities among models on different datasets.