The article introduces a methodology combining text-to-SQL generation with retrieval augmented generation (RAG) to answer epidemiological questions using electronic health records (EHR) and claims data. By integrating medical coding into the process, the approach significantly enhances performance over simple prompting. The study shows that while current language models are not yet accurate enough for unsupervised use, RAG offers a promising direction for improving their capabilities in an industry setting. The dataset created through manual curation provides a realistic selection of epidemiological questions within industry practice, showcasing high complexity. Leveraging the OMOP-CDM model helps address data retrieval variability across databases with differing data models. The methodology employs large language models and RAG to translate natural language questions into SQL queries accurately retrieving database information.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問