Enabling Efficient Hybrid Systolic Computation in Shared L1-Memory Manycore Clusters
The proposed hybrid architecture enables efficient systolic execution on shared-memory, multi-core architectures without compromising their general-purpose capabilities, performance, and programmability.