Core Concepts
Token Merging (ToMe) is a novel, training-free method that improves semantic binding in text-to-image synthesis by merging relevant text tokens, thereby enhancing the alignment between generated images and complex textual descriptions.
Hu, T., Li, L., van de Weijer, J., Gao, H., Khan, F.S., Yang, J., Cheng, M., Wang, K., & Wang, Y. (2024). Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis. Advances in Neural Information Processing Systems, 38.
This research paper aims to address the semantic binding problem in text-to-image synthesis, where generated images often fail to accurately reflect the relationships between objects and their attributes or related sub-objects described in the input text prompts.