Benchmarking Large Language Models' Clinical Skills Using an AI-Structured Clinical Examination (AI-SCE) Framework
The core message of this article is to introduce MedQA-CS, a comprehensive AI-SCE framework for evaluating large language models' (LLMs) clinical skills, which goes beyond traditional multiple-choice question benchmarks by assessing LLMs' ability to follow complex clinical instructions and interact with simulated patients.