toplogo
Sign In

Comprehensive Extraction of Social Determinants of Health from Pediatric Patient Notes Using Large Language Models


Core Concepts
Detailed social determinants of health can be extracted from pediatric clinical notes with high accuracy using fine-tuned and in-context learning approaches with large language models.
Abstract
This paper presents a novel annotated corpus, the Pediatric Social History Annotation Corpus (PedSHAC), which contains 1,260 annotated social history sections from pediatric patient notes. The corpus captures 10 distinct social determinants of health (SDoH) categories, including living and economic stability, prior trauma, education access, substance use history, and mental health, with an overall annotator agreement of 81.9 F1. The authors explore various large language model-based information extraction strategies, including fine-tuning BERT, T5, and in-context learning with GPT-4. The fine-tuned T5-2sQA model achieves the highest performance, with a micro-average F1 of 74.7% at the event-level extraction. The GPT-4 in-context learning approach with 3-shot examples (+guide) demonstrates comparable trigger extraction performance to the fine-tuned models, with an F1 of 82.3%. The results show that detailed SDoH representations can be extracted from pediatric clinical narratives with performance approaching human-level agreement. This enables the systematic collection and utilization of SDoH information in clinical and research settings, which can support data-driven interventions to improve individual and public health outcomes for pediatric populations.
Stats
"5th grade", "junior year" - Education Access "Employment: ... ", "works" - Employment "food stamps", "food insecurity" - Food Insecurity "lives", "foster care" - Living Arrangement "depression", "self-harm" - Mental Health "meth", "alcohol", "smokes" - Substance Use "mentally abusive", "bullying" - Trauma
Quotes
"SDoH are particularly important in pediatric populations because health disparities have a long-term impact on future attainment of health, including educational and economic success." "Many pediatric SDoH elements are primarily documented within the clinical narratives from EHRs. Such predominance of unstructured SDoH information in the EHRs impedes the systematic collection and utilization of SDoH information in clinical and research settings, limiting the potential for data-driven inventions to improve individual and public health."

Deeper Inquiries

How can the extracted SDoH information be effectively integrated into clinical decision support systems to improve pediatric health outcomes?

The extracted SDoH information can be effectively integrated into clinical decision support systems by creating algorithms that can analyze the data and provide actionable insights to healthcare providers. By incorporating this information into the EHR system, clinicians can have a more comprehensive view of the patient's background and social determinants that may impact their health. This integration can help in identifying at-risk pediatric populations, tailoring interventions to address specific social determinants, and ultimately improving health outcomes. Additionally, decision support systems can use this information to provide personalized care plans, referrals to social services, and resources to address the identified social determinants.

What are the potential biases in the current SDoH extraction models, and how can they be mitigated to ensure fair and equitable applications?

Potential biases in the current SDoH extraction models may arise from the training data, annotation guidelines, and the inherent biases in the language models themselves. Biases in the training data can lead to underrepresentation or misrepresentation of certain social determinants, impacting the model's performance. Annotation guidelines that are not comprehensive or inclusive can also introduce biases in the extracted information. Language models may have inherent biases based on the data they were trained on, which can perpetuate stereotypes or inaccuracies in the extracted SDoH information. To mitigate biases and ensure fair and equitable applications, it is essential to: Diversify Training Data: Ensure that the training data is diverse and representative of the pediatric population to capture a wide range of social determinants. Regularly Update Annotation Guidelines: Continuously review and update annotation guidelines to include a comprehensive list of social determinants and ensure inclusivity. Bias Detection and Mitigation: Implement bias detection techniques to identify and address biases in the model's predictions. Techniques like debiasing algorithms and fairness-aware training can help mitigate biases. Transparency and Accountability: Maintain transparency in the model's decision-making process and hold the developers accountable for addressing biases in the extraction models.

What other types of unstructured data, beyond clinical notes, could be leveraged to gain a more comprehensive understanding of the social determinants impacting pediatric populations?

Beyond clinical notes, several other types of unstructured data could be leveraged to gain a more comprehensive understanding of the social determinants impacting pediatric populations. Some of these data sources include: Social Media Data: Analyzing social media posts can provide insights into the social behaviors, interactions, and lifestyle choices of pediatric populations. Public Health Reports: Leveraging public health reports and surveys can offer valuable information on community-level social determinants such as access to healthcare, education, and environmental factors. Census Data: Utilizing census data can help in understanding demographic trends, income levels, housing conditions, and other socioeconomic factors that influence pediatric health outcomes. School Records: Examining school records can provide information on academic performance, attendance rates, and behavioral patterns that may be indicative of underlying social determinants. Insurance Claims Data: Analyzing insurance claims data can offer insights into healthcare utilization patterns, access to care, and the impact of social determinants on healthcare costs and outcomes. By integrating and analyzing these diverse sources of unstructured data, healthcare providers and researchers can gain a more holistic view of the social determinants impacting pediatric populations and tailor interventions to address these factors effectively.
0