Conceitos Básicos
HURRY, a reconfigurable and multifunctional ReRAM-based in-situ accelerator, enhances spatial and temporal utilization of ReRAM arrays to achieve significant performance, energy, and area efficiency improvements over existing ReRAM-based accelerators.
Resumo
The paper introduces HURRY, a reconfigurable and multifunctional ReRAM-based in-situ accelerator designed to address the spatial and temporal underutilization issues in conventional ReRAM-based accelerators.
Key highlights:
- Spatial Utilization: HURRY uses a block activation scheme (BAS) to dynamically resize and activate portions of the large 512x512 ReRAM arrays, balancing spatial utilization and peripheral overhead.
- Temporal Utilization: HURRY incorporates functional blocks (FBs) for various CNN operations like convolution, ReLU, max pooling, and softmax, reducing data movement between ReRAM and computing units.
- System-level Scheduling: HURRY employs fine-grained pipelining of FBs and strategic data mapping to further optimize performance.
- Evaluation Results: Compared to existing baselines, HURRY achieves up to 3.35x speedup, 5.72x higher energy efficiency, and 7.91x greater area efficiency for CNN inference tasks.
The paper demonstrates that HURRY's reconfigurability and multifunctionality effectively address the spatial and temporal underutilization issues in conventional ReRAM-based accelerators, leading to significant improvements in overall system efficiency.
Estatísticas
As the unit array size increases from 128x128 to 512x512, the ReRAM spatial utilization drops from 99% to 57%.
Compared to 16 units of 128x128 arrays, a single 512x512 array has a 3.7x lower chip area and 3.4x lower ADC power consumption.
ISAAC exhibits up to 48% of runtime spent on data movement between ReRAM arrays and computing units.
Citações
"HURRY uses a block activation scheme (BAS) to dynamically resize and activate portions of the large 512x512 ReRAM arrays, balancing spatial utilization and peripheral overhead."
"HURRY incorporates functional blocks (FBs) for various CNN operations like convolution, ReLU, max pooling, and softmax, reducing data movement between ReRAM and computing units."
"Compared to existing baselines, HURRY achieves up to 3.35x speedup, 5.72x higher energy efficiency, and 7.91x greater area efficiency for CNN inference tasks."