Analyzing Counterexamples to Tokenization and the Noiseless Channel
The authors present counterexamples to the Rényi efficiency hypothesis in tokenization metrics, showcasing scenarios where higher Rényi efficiency does not correlate with better downstream model performance.