Accelerating Large Language Model Inference through Speculative Execution
Speculative execution, a technique originally from computer architecture, can significantly boost the inference speed of large language models by drafting and verifying token sequences in parallel.