Core Concepts
The author evaluates and compares Graphcore IPU, Sambanova RDU, and GPU platforms for AI/ML acceleration. The study aims to provide insights into the performance trade-offs and capabilities of each hardware accelerator.
Abstract
The content delves into the evaluation of emerging AI/ML accelerators, focusing on Graphcore IPU, Sambanova RDU, and GPU platforms. It discusses their architectural intricacies, memory hierarchy, computing resources, programming models, compiler stacks, parallelism support, system information, and benchmark evaluations across various DNN operators. The study highlights the potential advantages of data-flow architectures in enhancing performance and energy efficiency for AI/ML tasks. Through comprehensive benchmarking results on different operators like GEMM, BERT models, 2D convolutions, SPMM tasks, and streaming operators such as element-wise square and non-linear operators like ReLU and Sigmoid across these platforms. The findings aim to contribute to a better understanding of current hardware acceleration technologies in the field of AI/ML.
Stats
Each GC200 IPU chip contains 1472 independent IPU-tiles.
The GC200 IPU chip can process up to 8832 separate program threads in parallel.
Each PCU in SN10 RDU has six SIMD stages with 16 SIMD lanes each.
The SN10 RDU provides 320 MB of on-chip SRAM.
Nvidia V100 has a die size of 815 mm^2 with 21.1 billion transistors.
AMD MI100 has a die size of 750 mm^2 with 25.6 billion transistors.
The GC200 IPU chip features Accumulating Matrix Product (AMP) units for floating-point computation.
Nvidia GPUs have Tensor Cores that perform multiple FP16/FP32 mixed-precision fused multiply-add operations within a single cycle.
Quotes
"The relentless advancement of artificial intelligence (AI) and machine learning (ML) applications necessitates the development of specialized hardware accelerators."
"Traditional von Neumann architectures are increasingly challenged by modern AI/ML workloads."
"Graphcore's Intelligence Processing Unit (IPU) stands out for its unique approach to hardware acceleration."