Enhancing Speculative Decoding via Knowledge Distillation: DistillSpec Improves Alignment Between Draft and Target Language Models
DistillSpec, a knowledge distillation method, improves the alignment between a small draft model and a large target model to enhance the speed of speculative decoding without compromising performance.