Core Concepts
NeuPIMs proposes a novel heterogeneous accelerator system that combines NPU and PIM devices to enhance the efficiency of large language model inference.
Abstract
NeuPIMs introduces a system that optimizes GEMM and GEMV computations in Large Language Models (LLMs). By combining NPUs and PIM technology, NeuPIMs achieves significant throughput improvements compared to existing systems. The proposed hardware-algorithm co-design approach addresses microarchitectural and algorithmic challenges, resulting in enhanced resource utilization and overall efficiency.
Stats
NeuPIMs achieves 2.3× throughput improvement over an NPU-only approach.
Compared to a naïve NPU-PIM integrated system, NeuPIMs achieves a 1.6× throughput improvement.