Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Modernizing n-gram language models by scaling the training data to 5 trillion tokens and extending the n-gram to be unbounded, enabling novel analyses of human-written and machine-generated text, and improving the performance of large neural language models.