The author introduces the concept of Diffusion Protein Language Models (DPLM) as a versatile protein language model that excels in generative and predictive capabilities for protein sequences. The approach combines diffusion models with language models to create a unified and powerful tool for understanding and designing proteins.
DPLM is a versatile protein language model that excels in generative and predictive tasks, offering superior representation learning and conditional generation capabilities.
DPLM-2 is a novel multimodal protein language model that leverages a discrete diffusion framework and structure tokenization to simultaneously generate highly compatible protein structures and sequences, outperforming existing methods in co-generation tasks and demonstrating strong performance in folding, inverse folding, and motif-scaffolding.