Discrete protein language representation for sequence-structure co-generation.
Introducing FoldTokenizer to create a discrete protein language for sequence-structure representation, enabling innovative generative models like FoldGPT.
The author introduces DiMA, a model that leverages continuous diffusion on embeddings derived from the protein language model to generate amino acid sequences, surpassing leading solutions in terms of quality and diversity.
The author introduces the concept of Diffusion Protein Language Models (DPLM) as a versatile protein language model that excels in generative and predictive capabilities for protein sequences. The approach combines diffusion models with language models to create a unified and powerful tool for understanding and designing proteins.