Introducing FoldTokenizer to create a discrete protein language for sequence-structure representation, enabling innovative generative models like FoldGPT.
Discrete protein language representation for sequence-structure co-generation.