Limitations of Adapting Pre-trained Language Models for Auto-regressive Text-to-Image Generation
Pre-trained language models do not provide significant benefits for auto-regressive text-to-image generation, due to the fundamental differences between image and text tokens.