Significant performance improvements of up to 58% were achieved for optimized 3D stencil kernels on the latest NVIDIA Hopper GPU architecture compared to the previous Ampere generation. Optimization strategies were developed for CUDA, OpenACC, and OpenMP programming models to fully leverage the architectural features of the Hopper GPU.
Seer proposes a machine learning-based predictor, enabling runtime kernel selection for irregular workloads with significant performance improvements.