Core Concepts
PROTLLM is a versatile large language model designed to handle both protein-centric and protein-language tasks efficiently.
Abstract
Abstract:
PROTLLM proposed for protein-centric and protein-language tasks.
Features dynamic protein mounting mechanism.
Utilizes protein-as-word language modeling approach.
Introduction:
Importance of understanding proteins for AI advancement in bioscience.
Deep learning techniques applied to various protein-centric applications.
Methods:
Description of PROTLLM framework with autoregressive transformer language model, protein encoder, and cross-modal connectors.
Dynamic protein mounting mechanism explained.
InterPT Dataset:
Construction of InterPT dataset for pre-training PROTLLM.
Includes multi-protein scientific articles, protein-annotation pairs, and instruction-following data.
Experiments:
Evaluation on three types of downstream tasks: protein-centric tasks, in-context learning, and text-guided functional protein retrieval.
Conclusion:
Summary of the effectiveness of PROTLLM in handling diverse tasks related to proteins.
Stats
PROTLLMは、競合ベースラインに比べて競争力のあるパフォーマンスを示しています。
InterPTデータセットは、多様なソースから構築されており、プロテイン関連の知識を学習するようにモデルを促します。