NeuPIMs proposes a novel heterogeneous accelerator system that combines NPU and PIM devices to enhance the efficiency of large language model inference.