Efficient Autoregressive Decoding with Adaptive Feed Forward Skipping in Large Language Models
FFN-SkipLLM is a novel fine-grained skip strategy that can skip ~25-30% of feed-forward network (FFN) blocks in autoregressive large language models (LLMs) with marginal change in performance on knowledge-intensive tasks.