Retrieval-Based Speculative Decoding: A Novel Approach to Accelerate Language Model Generation
REST, a novel algorithm, leverages retrieval from a datastore to generate draft tokens, enabling significant speedup in language model generation without requiring additional training.