Core Concepts
Large language models, particularly GPT4, can perform on par with expert human annotators in extracting a wide range of mental health factors from adolescent social media posts, though they still exhibit some limitations in handling negation and factuality.
Abstract
The study aimed to investigate the performance of large language models (LLMs), specifically GPT3.5 and GPT4, in extracting mental health factors from adolescent social media posts and compare their performance to expert human annotations.
The researchers created a novel dataset of Reddit posts from adolescents aged 12-19, annotated by expert psychiatrists for various mental health-related categories: TRAUMA, PRECARITY, CONDITION, SYMPTOMS, SUICIDALITY, and TREATMENT. They also generated synthetic datasets using GPT3.5 and GPT4 to assess the models' performance on text they generate and annotate simultaneously.
The results showed that GPT4 performed on par with human inter-annotator agreement, particularly in the Positive Only metrics and subcategory accuracy. The performance on synthetic data was substantially higher, suggesting the complexity of real data rather than an inherent advantage. However, the analysis revealed that both GPT3.5 and GPT4 still occasionally make errors in handling negation and factuality, despite their overall strong performance.
The study concludes that LLMs, especially GPT4, can be valuable tools for cost-effective and scalable monitoring and intervention in the domain of adolescent mental health, but their limitations in certain areas should be considered. The potential use of synthetic data for training task-specific models is also discussed, with the caveat that the reduced diversity in synthetic data needs to be weighed against the increased label reliability.
Stats
"I am not feeling suicidal but I can't sleep at all"
"My sister used to constantly bully me"
"I was harassed for years in secondary school"
"My family is quite wealthy"
"I often cut my wrists with scissors"
Quotes
"Large language models, particularly GPT4, can perform on par with expert human annotators in extracting a wide range of mental health factors from adolescent social media posts, though they still exhibit some limitations in handling negation and factuality."
"The potential use of synthetic data for training task-specific models is also discussed, with the caveat that the reduced diversity in synthetic data needs to be weighed against the increased label reliability."