Core Concepts
This study provides a comprehensive analysis of GPU acceleration for computational fluid dynamics (CFD) simulations on HPC systems, evaluating the impact on simulation speed, power consumption, and cost.
Abstract
This paper presents a comprehensive analysis of GPU acceleration for computational fluid dynamics (CFD) simulations on HPC systems. The study utilizes two distinct CFD simulations, one for external flow and one for internal flow, to assess the performance, power consumption, and cost implications of GPU acceleration compared to traditional CPU-only architectures.
The key findings are:
Simulation Speed:
GPU acceleration, particularly with the NVIDIA A100 GPU, significantly reduces the total simulation time by up to 95% compared to CPU-only architectures.
The mean iteration time also shows a similar trend, with the A100 GPU being up to 65% faster than the V100 GPU.
Initialization Speed:
The Cascade Lake architecture is notably slower in the initialization stage compared to the Sapphire Rapids, V100, and A100 architectures.
Researchers conducting CFD simulations with large computational meshes, complex domain setups, or extensive initialization steps should consider the Sapphire Rapids, V100, or A100 architectures to minimize this overhead.
Power Consumption:
While GPU acceleration is generally more energy-efficient than CPU-only architectures, the power consumption of the V100 GPU was higher than expected in certain scenarios due to the requirement of dual V100 cards to provide sufficient VRAM.
Users should carefully assess VRAM requirements, simulation mesh sizes, and available GPU configurations to select the most energy-efficient setup.
Cost:
Despite the higher queue charge rates, the speed of GPU acceleration makes it a more cost-effective option compared to CPU-only approaches.
The V100 and A100 architectures stand out as the most cost-effective choices for researchers, while the Broadwell architecture is the most cost-effective CPU-only option.
The comprehensive nature of this study, covering speed, power, and cost implications, provides valuable insights for researchers and HPC administrators when selecting the appropriate hardware and architecture for their CFD simulations.
Stats
The total simulation time for the external case on the Broadwell architecture was 1192.8 seconds for 50 iterations and 2318.4 seconds for 100 iterations.
The total simulation time for the internal case on the Broadwell architecture was 3272.1 seconds for 50 iterations and 6946.0 seconds for 100 iterations.
The mean iteration time for the external case on the Broadwell architecture was 23.8 seconds for 50 iterations and 23.1 seconds for 100 iterations.
The mean iteration time for the internal case on the Broadwell architecture was 65.4 seconds for 50 iterations and 69.4 seconds for 100 iterations.
The power consumption for the external case on the Broadwell architecture was 44.7 Wh for 50 iterations and 86.9 Wh for 100 iterations.
The power consumption for the internal case on the Broadwell architecture was 122.7 Wh for 50 iterations and 260.0 Wh for 100 iterations.
The service unit cost for the external case on the Broadwell architecture was 6.7 SU for 50 iterations and 12.2 SU for 100 iterations.
The service unit cost for the internal case on the Broadwell architecture was 16.9 SU for 50 iterations and 35.0 SU for 100 iterations.
Quotes
"GPU acceleration remains a clearly optimal choice, but the selection of individual GPUs is less so. Users with smaller mesh requirements may see practical benefits such as reduced cost from single GPU use, however larger models benefit significantly from the adoption of larger but more costly, GPU architectures."
"Initialization time is a crucial factor for researchers using commercial codes like ANSYS Fluent, where limited access to underlying code makes optimization challenging. The Broadwell and Cascade Lake architectures are notably slower in this regard, prompting researchers initialising complex simulations to opt for architectures such as the Sapphire Rapids, V100, or A100."