Large Language Models Evaluation with State-Aware Patient Simulator
Core Concepts
Introducing the Automated Interactive Evaluation (AIE) framework and State-Aware Patient Simulator (SAPS) to enhance LLMs' clinical task assessment.
Abstract
Large Language Models (LLMs) show proficiency in human interactions but lack exploration in the medical field.
AIE and SAPS bridge the gap between traditional LLM evaluations and clinical demands.
SAPS categorizes doctor LLM actions, ensuring accurate patient responses.
AIE's empirical validation demonstrates alignment with human evaluations, revolutionizing medical LLM testing.
Comparative evaluation shows closed-source models outperform open-source ones.
Metrics correlation analysis highlights specificity's importance over accuracy in diagnostic dialogues.
Turn analysis reveals improved information gathering over dialogue turns.
Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator
Stats
"Our extensive experimental validation demonstrates the effectiveness of the AIE framework."
"The SAPS demonstrates a performance closer to humans compared to other patient simulators."
"The SAPS shows improvements across all metrics when compared to the standard GPT-4 model."
Quotes
"The SAPS can reliably play the role of a patient, facilitating effective and realistic interactions with doctor LLMs."
"The AIE framework offers insights into their applicability in real-world clinical settings."