The author argues for the theoretical importance of compression in tokenization, demonstrating its empirical significance for language model performance.