核心概念
Proposing a novel DRAM cache design optimized for GPUs with Storage-Class Memory to overcome memory capacity limitations and improve performance.
摘要
The article discusses the challenges faced by GPUs due to limited memory capacity, especially in critical workloads like deep learning. It introduces a solution using high-capacity Storage-Class Memory (SCM) and DRAM cache to enhance memory capacity and performance. The proposed DRAM cache design considers GPU thread characteristics, SCM properties, and spatial locality to optimize caching efficiency. Additionally, a Configurable Tag Cache (CTC) is suggested to reduce DRAM cache probe traffic. Power management techniques are also proposed to address SCM's power consumption and thermal issues.
統計資料
Compared to HBM, the HMS improves performance by up to 12.5× (2.9× overall) and reduces energy by up to 89.3% (48.1% overall).
Techniques proposed reduce DRAM cache probe and SCM write traffic by 91-93% and 57-75%, respectively.
引述
"Our proposed GPU memory system can overcome the limited memory capacity and resulting performance degradation from oversubscription of DRAM-only GPUs."
"We propose simple techniques to mitigate SCM’s power consumption and performance impact by adjusting the operation modes of the SCM and DRAM."