แนวคิดหลัก
Supervised Fine-Tuned approaches, such as RoBERTa and BINDER (PubMedBERT), outperform general-purpose Large Language Models like ChatGPT on intent detection and named entity recognition tasks in the biomedical domain.
บทคัดย่อ
The paper presents a comprehensive empirical evaluation of intent detection and named entity recognition (NER) tasks in the biomedical domain. It compares the performance of Supervised Fine-Tuned (SFT) approaches against general-purpose Large Language Models (LLMs) like ChatGPT.
Key highlights:
- SFT models, such as RoBERTa and BINDER (PubMedBERT), consistently outperform LLMs like ChatGPT on intent detection across three datasets and NER across five biomedical datasets.
- PubMedBERT can outperform ChatGPT on most NER benchmarks with just 5 supervised examples, demonstrating the importance of domain-specific pretraining.
- Transformer-based SFT models perform better than LSTM, CNN, and feature-based ML models, as they can effectively leverage domain-specific pretraining.
- The paper also analyzes the impact of training data size on SFT model performance, showing that BINDER (PubMedBERT) can achieve high performance with only 10% of the training data.
- The authors provide detailed error analysis and identify key challenges, such as handling new/unapproved entities and relaxing strict entity type matching.
Overall, the study highlights the continued relevance of task and domain-specific approaches over general-purpose LLMs for complex biomedical language understanding tasks.
สถิติ
Biomedical queries have seen a significant increase on search engines like Bing and Google over the past decade.
The CMID and KUAKE-QIC datasets used for intent detection were translated from Chinese to English, with a translation accuracy of 91.75% and 97.0% respectively.
The NER datasets cover a wide range of entity types, including drugs, diseases, chemicals, genetics, and human anatomy.
คำพูด
"While recent research is centered around the development of general-purpose LLMs, that are shown to exhibit exceptional Common Sense Reasoning capabilities, we show that these models face challenges in transferring their performance to intricate biomedical domains."
"Our experiments reveal that the biomedical transformer-based PubMedBERT model outperforms few-shot prompted ChatGPT (Turbo 3.5) on 4 biomedical NER benchmarks with just 5 supervised examples."