Core Concepts
The author highlights the importance of addressing annotation quality issues in mental health datasets to enhance NLP model reliability for depression level estimation from social media texts.
Abstract
The content discusses the challenges in predicting depression levels from social media texts, focusing on the PRIMATE dataset. It addresses concerns about annotation validity and false positives, advocating for improved methodologies. The study emphasizes the necessity of involving domain experts in the annotation process for better mental health assessments.
The paper evaluates various NLP models' performance using the PRIMATE dataset, highlighting discrepancies and areas of improvement. It also introduces a more fine-grained labeling scheme to reduce mislabeling risks and enhance transparency. The findings underscore the need for standardized approaches to mental health dataset annotations and collaboration between experts and practitioners.
Stats
The annotator agreement using Fleiss’ kappa is reported to be 67% for initial annotation and 85% after involvement of MHPs.
The dataset consists of 2003 posts.
DistilBERT shows an F1-score of .58 for LOI symptom on the validation set.
RoBERTa-Large performs better for ENE, LSE, MOV, and SUI symptoms with F1-scores ranging from .71 to .91 on the validation set.
Quotes
"We believe that our evidence-based labelling scheme reduces the risk of mislabelling and is more transparent for further verification."
"Our findings advocate for a more rigorous approach to mental health dataset annotation, emphasizing greater involvement of domain experts."
"The release of refined annotations under a Data Use Agreement contributes a valuable resource for future research."