Kernkonzepte
Hippocrates is an open-source framework that aims to elevate the proficiency of large language models in medical reasoning and decision-making through continued pre-training, supervised fine-tuning, and reinforcement learning from AI-generated feedback.
Zusammenfassung
The Hippocrates framework is designed to advance the capabilities of large language models (LLMs) in the medical domain. It consists of several key components:
-
Continued Pre-training:
- The framework utilizes a carefully curated corpus of medical text data, including clinical practice guidelines, patient summaries, and PubMedQA contexts, to further pre-train base LLMs like LLaMA2 and Mistral.
- This stage aims to equip the models with domain-specific medical knowledge and terminology.
-
Supervised Fine-tuning:
- Hippocrates employs two distinct instruction tuning (IT) datasets: the General Instructions Data and the Evaluation Instructions Data.
- The General Instructions Data is designed to enhance the models' generalization capabilities, while the Evaluation Instructions Data facilitates direct comparisons with existing medical LLMs.
- The fine-tuning process aligns the models' outputs with clinical requirements and medical reasoning.
-
Medical Preference Learning:
- The framework incorporates a novel strategy for preference learning, leveraging the RLAIF methodology and GPT4 to annotate preferences based on patient-doctor dialogues.
- This stage aims to further align the models' outputs with the preferences and decision-making processes of medical professionals.
The authors introduce two advanced 7B parameter models, Hippo-7B and Hippo-7B, which demonstrate superior performance on multiple medical benchmarks, including MedMCQA, MedQA, PubMedQA, and the USMLE series. Remarkably, the Hippo models outperform even larger models with 70B parameters, highlighting the effectiveness of the Hippocrates framework.
The authors emphasize the importance of transparent and comprehensive access to LLM resources for advancing the field of medical AI, fostering reproducibility, and encouraging innovation. By openly sharing their training data, codebase, checkpoints, and evaluation protocols, the Hippocrates framework aims to democratize the benefits of AI research in healthcare and make them available globally.
Statistiken
The continued pre-training dataset consists of 298M tokens from Medical Guidelines, PMC-Patients, and PubMedQA-contexts.
The General Instructions Data for supervised fine-tuning contains 58.7M tokens from 9 different datasets.
The Evaluation Instructions Data for supervised fine-tuning contains 124.2M tokens from the training splits of MedMCQA, MedQA, and PubMedQA.
The medical preference dataset contains 15,258 samples annotated using GPT4 and the RLAIF methodology.
Zitate
"Transparent, comprehensive access to LLM resources is essential for advancing the field, fostering reproducibility, and encouraging innovation in healthcare AI."
"Our models not only outperform existing 7B and 13B models by a significant margin but also deliver results on par with, and in some cases exceeding, those of 70B models."