Core Concepts
GPU-RANC provides a highly scalable CUDA-accelerated simulation framework that enables rapid exploration and optimization of neuromorphic architectures by achieving up to 780x speedup over the serial CPU-based RANC simulator.
Abstract
The paper introduces GPU-RANC, a CUDA-accelerated implementation of the Reconfigurable Architecture for Neuromorphic Computing (RANC) simulation framework. RANC is an open-source ecosystem that allows hardware architects and application engineers to investigate performance bottlenecks and explore design optimizations for neuromorphic computing architectures.
The key highlights and insights are:
Parallelization Approach:
Parallelized the Neuron Block, Router, and Scheduler components of RANC to exploit the massive parallelism of GPUs.
Implemented core-level, grid-level, and synapse-level optimizations for the Neuron Block, achieving up to 8905x speedup.
Optimized the Router and Scheduler components to leverage the GPU's parallel processing capabilities.
Performance Evaluation:
Evaluated the GPU-RANC implementation across various applications, including MNIST, CIFAR-10, and vector-matrix multiplication (VMM).
Demonstrated up to 780x speedup for the MNIST-512 core application compared to the serial CPU-based RANC simulator.
Observed significant speedup gains across all test cases, with the largest improvement for the TrueNorth Reference application at 521x.
Significance and Impact:
The GPU-RANC framework enables rapid exploration and optimization of neuromorphic architectures by drastically reducing simulation times.
This allows hardware architects and application engineers to conduct more comprehensive studies and converge to optimal neuromorphic designs faster.
The ability to simulate large-scale neuromorphic systems in a matter of seconds opens up new possibilities for researching non-cognitive applications on neuromorphic platforms.
Overall, the GPU-RANC framework provides a powerful tool for the neuromorphic computing research community to accelerate the development and optimization of energy-efficient neuromorphic architectures.
Stats
The serial RANC simulator takes 5.6 hours to complete the MNIST-512 core application.
The GPU-RANC implementation reduces the simulation time for the MNIST-512 core application from 5.6 hours to 26 seconds, a 780x speedup.
The GPU-RANC implementation achieves up to 8905x speedup for the Neuron Block computations compared to the serial RANC.
Quotes
"GPU-RANC offers a viable approach in exploring non-cognitive application mapping within research."
"Several time consuming neuromorphic simulations, originally requiring in the magnitude of hours to complete, can now be completed in the magnitude of seconds with the benefit of GPU-RANC."