Efficient End-to-End Acceleration of Autoregressive Transformer Models using Hybrid Process-in-Memory Architecture
A hybrid process-in-memory (PIM) accelerator, PIM-GPT, achieves state-of-the-art performance and energy efficiency for autoregressive Transformer models like GPT by leveraging PIM for memory-intensive operations and an ASIC for other computations.