Core Concepts
LookupFFN proposes a memory look-up approach to reduce FLOP in Feed Forward Networks, making them CPU-friendly.
Abstract
1. Introduction
CPUs gaining importance for inference in data centers.
CPUs offer advantages like latency, security, and cost.
CPUs lack computational intensity compared to GPUs but have large caches.
2. Preliminaries
FFNs heavily rely on GEMM, which is compute-intensive.
Various methods proposed to reduce FLOP needs of FFNs.
LookupFFN aims to make FFNs FLOP-lite and CPU-friendly.
3. FFN as Lookups
End-to-end construction for differentiable table lookups as an efficient alternative to GEMM for FFNs.
Differentiable lookup formulation proposed with efficient hash and gather operations.
4. Experiments
LookupFFN achieves lower perplexity with fewer FLOPs compared to baselines.
LookupFFN scales well to larger models with significant FLOP reduction.
Downstream finetuning shows competitive performance with reduced FLOPs.
5. Conclusions
Balancing trade-off between compute and memory resources crucial for future DNN architectures.
LookupFFN's benefits can extend to other DNN models, complementing server chip developments.
Stats
CPU-based inference in the data-center is growing in importance as evidenced by recent server chip announcements from IBM, Intel, AMD and ARM (Lichtenau et al., 2022; University of Wisconsin, Madison, USA 2 NVIDIA Research).
CPUs provide tremendously large caches in the range of 128MB to 192MB (Burd et al., 2022).
LookupFFN achieves lower perplexity with fewer FLOPs compared to baselines (Table 1).