Core Concepts
PUMA introduces a novel memory allocation mechanism to support Processing-Using-Memory architectures, addressing the limitations of traditional memory allocators in meeting the specific requirements of PUD substrates.
Abstract
The content discusses the challenges faced by traditional memory allocators in supporting Processing-Using-Memory (PUM) architectures, particularly focusing on Processing-Using-DRAM (PUD) operations. It highlights the inefficiencies of standard memory allocation routines in meeting the data layout and alignment needs of PUD substrates. To address these issues, a new memory allocation routine called PUMA is proposed to enable aligned data allocation for PUD instructions without requiring hardware modifications.
PUMA leverages internal DRAM mapping information and huge pages to ensure proper data alignment and allocation for PUD operations. The routine consists of three main components: DRAM organization information, DRAM interleaving scheme, and a huge pages pool for PUD memory objects. By splitting huge pages into finer-grained units aligned with DRAM subarrays, PUMA enhances performance by increasing the likelihood of operations being executed in DRAM.
Evaluation results demonstrate that PUMA significantly outperforms baseline memory allocators across various micro-benchmarks and allocation sizes. The performance improvements are more pronounced with larger data allocations due to reduced data movement between DRAM and CPU. Overall, PUMA proves to be an efficient and practical solution for memory allocation in PUD substrates.
Stats
A typical DRAM subarray has 1024 DRAM rows, each with 1024 DRAM columns.
Using malloc and posix_memalign results in 0% of operations executed due to data misalignment.
For large-enough allocation sizes (e.g., 32 Kb), only up to 60% of operations using huge pages-based memory allocation can be successfully executed in DRAM.