核心概念
Thought Graph introduces a novel framework for complex biological reasoning, surpassing existing methods in gene set analysis and semantic relationships.
摘要
1. Introduction
- Understanding links between diseases, drugs, phenotypes, genes, and biological processes is crucial.
- Analyzing gene sets reveals patterns in gene behavior across health and disease states.
- Challenges arise from weak signals of individual genes and divergent conclusions by different research groups.
2. Thought Graph Framework
- Introduces Tree-of-Thought architecture with Large Language Model (LLM) for thought expansion.
- Utilizes voter LLM to guide decision-making for future steps.
- Integrates domain-specific external knowledge bases to understand semantic connections within the Thought Graph.
3. Methodology
- Problem formulation involves designing a framework to generate a tree structure graph representing terms associated with genes.
- Infrastructure of Thought Graph adapts ToT as a graph generator to create a curated tree graph.
- Thoughts expansion process proceeds in breadth-first fashion generating candidate nodes at each step.
4. Experiment & Evaluation
- Data collected from the Gene Ontology database for evaluation.
- Baselines include GSEA and various LLM approaches like IO zero-shot, CoT, and Hu et al.
- Evaluation metrics include cosine similarity and similarity percentile.
5. Conclusion
- Thought Graph advances gene ontology and bioinformatics by integrating gene set analysis with semantic graphs.
- Demonstrates potential to outperform existing methods in mapping complex gene interactions and functions.
统计
Thought Graph can generate diverse yet precise entities to tackle potential annotations discrepancies in biological processes.
Our framework prioritizes the integration of domain-specific external knowledge bases to understand the semantics of connections within the Thought Graph.