Core Concepts
Novel data augmentation techniques enhance disease name normalization performance by respecting structural invariance and hierarchy properties.
Abstract
The content delves into disease name normalization challenges, introduces data augmentation methods, and evaluates their effectiveness across various models. It highlights the importance of semantic integrity and hierarchical structure in enhancing model performance.
Challenges: Varied writing styles, semantic density, data scarcity.
Methods: Axis-Word Replacement (AR), Multi-Granularity Aggregation (MGA).
Results: Improved performance across baseline models, especially on smaller datasets.
Comparison: Outperforms EDA and Back Translation methods.
Ablation Study: Removal of methods leads to decreased performance.
Smaller Datasets: Performance improvement more pronounced with smaller datasets.
LLM Baselines Comparison: Demonstrates superior tradeoff between model size and performance compared to LLMs.
Stats
Our method can achieve on-par performance with ChatGPT while being over 3,000 times smaller in size.
Quotes
"Our proposed method can significantly outperform a model over 50 times larger in size."