Large Language Models outperform diffusion models in visual generation with the introduction of MAGVIT-v2, a novel video tokenizer.