Concepts de base
This review proposes a unified encoder-decoder framework to provide a general understanding of the building blocks of deep learning-based medical coding models and summarizes recent advanced models under the proposed framework.
Résumé
This review paper presents a unified encoder-decoder framework for automated medical coding using deep learning techniques. The key points are:
-
Encoder Modules:
- Recurrent neural networks (RNNs) like LSTM and GRU capture sequential dependencies in clinical notes.
- Convolutional neural networks (CNNs) extract local features from text.
- Attention-based and Transformer-based encoders leverage contextual information.
- Graph neural networks model structural information in medical knowledge graphs.
- Hierarchical encoders handle long clinical notes by encoding at different granularity levels.
-
Building Deep Architectures:
- Stacking multiple neural layers to build deep models.
- Residual connections and highway networks to address the vanishing gradient problem.
- Embedding injection to preserve low-level features in deep models.
-
Decoder Modules:
- Fully connected layers for basic code prediction.
- Attention-based decoders to focus on code-relevant information.
- Hierarchical decoders that leverage the hierarchical structure of medical code systems.
- Multitask decoders that jointly predict multiple coding systems.
- Few-shot decoders for handling rare medical codes.
-
Auxiliary Information:
- Leveraging code descriptions, hierarchies, and external knowledge to enhance representation learning.
- Incorporating human-in-the-loop learning to improve model performance and explainability.
The review also discusses benchmarks, real-world applications, research challenges, and future directions in automated medical coding using deep learning.
Stats
"Automated medical coding, an essential task for healthcare operation and delivery, makes unstructured data manageable by predicting medical codes from clinical documents."
"The ICD system transforms diseases, symptoms, signs, and treatment procedures into standard medical codes."
"ICD-9 and ICD-10 coding systems have more than 14,000 and 68,000 codes, respectively."
"The distribution of medical codes in an EHR system is imbalanced, also known as the long-tail phenomenon."
Citations
"Accurate medical code assignment is essential in providing appropriate medical care. Properly coded medical information is vital for clinical decision-making, public health surveillance, research, and reimbursement."
"Automated diagnosis coding can also be deployed to detect missed diagnoses and adverse effects."
"The breakthrough of natural language processing with deep neural networks has led to neural classifiers with word embedding and deep learning."