This paper introduces LC-PLM, a novel protein language model based on a computationally efficient state space model architecture (BiMamba-S), which outperforms Transformer-based models in capturing long-range dependencies within protein sequences and incorporating biological interaction information from protein-protein interaction graphs, leading to significant improvements in various downstream tasks like protein structure prediction and function prediction.
限られた計算リソースの中でタンパク質言語モデルの性能を最大限に引き出すためには、モデルサイズとデータセットサイズを計算量に応じて最適化する必要がある。
DPLM-2 is a novel multimodal protein language model that leverages a discrete diffusion framework and structure tokenization to simultaneously generate highly compatible protein structures and sequences, outperforming existing methods in co-generation tasks and demonstrating strong performance in folding, inverse folding, and motif-scaffolding.
DPLM is a versatile protein language model that excels in generative and predictive tasks, offering superior representation learning and conditional generation capabilities.
The author introduces the concept of Diffusion Protein Language Models (DPLM) as a versatile protein language model that excels in generative and predictive capabilities for protein sequences. The approach combines diffusion models with language models to create a unified and powerful tool for understanding and designing proteins.