NeuPIMs: A Heterogeneous Acceleration System for Large Language Model Batched Inference
NeuPIMs proposes a heterogeneous accelerator system for efficient batched inference of Large Language Models, combining NPU and PIM technologies to optimize GEMM and GEMV computations.