The Weather Research and Forecasting (WRF) model is an atmospheric model that solves the 3D Euler equations using finite differences. It supports parallel computation through domain decomposition (MPI) and shared memory (OpenMP) within each domain.
One computationally expensive microphysics parameterization in WRF is the Fast Spectral-Bin Microphysics (FSBM) scheme, which calculates grid-resolved cloud condensate variables. FSBM uses discrete size intervals (bins) for cloud droplets and raindrops, and its computational cost scales quadratically with the number of bins per grid point.
To take advantage of GPU resources on the Perlmutter supercomputer at NERSC, the authors ported parts of the FSBM routine to NVIDIA GPUs using OpenMP device offloading directives. They explored a workflow for optimization that uses both runtime profilers (GNU gprof and NVIDIA Nsight Systems) and the Codee static code analysis tool.
The key optimizations include:
These optimizations resulted in a 2.08x overall speedup for the CONUS-12km thunderstorm test case, with the FSBM routine itself seeing a 2.99x speedup. Further evaluation showed that the GPU version maintains good accuracy compared to the CPU version, with 3-6 digits of agreement for state variables and 1-5 digits for microphysics variables.
The authors also discuss the limitations of the current implementation, such as the low arithmetic intensity due to the memory-bound nature of the FSBM scheme, and plans for future optimizations targeting other computationally expensive routines in WRF.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Chayanon (Na... at arxiv.org 09-12-2024
https://arxiv.org/pdf/2409.07232.pdfDeeper Inquiries