Kernkonzepte
Multilingual dataset for pharmacovigilance aids in detecting adverse drug reactions across languages.
Zusammenfassung
The content introduces a multilingual corpus for pharmacovigilance, focusing on adverse drug reactions (ADRs) in German, French, and Japanese. It discusses the importance of user-generated data sources in uncovering ADRs and the challenges associated with existing clinical corpora. The dataset covers annotations for various entity types, attributes, and relations, contributing to the development of multilingual language models for healthcare. The article also highlights the significance of social media in providing population-level signals for ADRs and the necessity to extract information from texts written by patients. It further discusses the need for shareable corpora for detecting ADRs and the potential of social media in supporting clinicians to understand patients better. The article concludes by outlining the core message and the experiments conducted on named entity recognition, attribute classification, and relation extraction using XLM-RoBERTa models.
Directory:
Introduction
Importance of ADRs in pharmacovigilance
Significance of user-generated data sources
Challenges with existing clinical corpora
Dataset Creation
Multilingual corpus for ADRs in German, French, and Japanese
Annotations for entity types, attributes, and relations
Contribution to healthcare language models
Social Media and ADR Detection
Utilizing social media for ADR detection
Extracting information from patient perspectives
Dataset Challenges and Experiments
Challenges in ADR detection across languages
Baseline models for named entity recognition, attribute classification, and relation extraction
Future Work and Ethical Considerations
Improving cross-lingual performance
Extending the dataset with more diverse data
Ethical considerations in dataset creation
Statistiken
The corpus contains annotations covering 12 entity types, four attribute types, and 13 relation types.
The German dataset was annotated by two annotators, achieving a micro average F1 score of 0.77 for entities.
The French dataset had the most frequent entity type as disorder (588 mentions), followed by drug.
The Japanese dataset is much larger than the German and French datasets, with disorder and drug being the most frequent types.
Zitate
"User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs)."
"Social media content can provide population-level signals for ADRs and other health-related topics."