toplogo
Sign In

Comprehensive Analysis of GPU Acceleration for Computational Fluid Dynamics on HPC Systems: Evaluating Speed, Power, and Cost Implications


Core Concepts
This study provides a comprehensive analysis of GPU acceleration for computational fluid dynamics (CFD) simulations on HPC systems, evaluating the impact on simulation speed, power consumption, and cost.
Abstract
This paper presents a comprehensive analysis of GPU acceleration for computational fluid dynamics (CFD) simulations on HPC systems. The study utilizes two distinct CFD simulations, one for external flow and one for internal flow, to assess the performance, power consumption, and cost implications of GPU acceleration compared to traditional CPU-only architectures. The key findings are: Simulation Speed: GPU acceleration, particularly with the NVIDIA A100 GPU, significantly reduces the total simulation time by up to 95% compared to CPU-only architectures. The mean iteration time also shows a similar trend, with the A100 GPU being up to 65% faster than the V100 GPU. Initialization Speed: The Cascade Lake architecture is notably slower in the initialization stage compared to the Sapphire Rapids, V100, and A100 architectures. Researchers conducting CFD simulations with large computational meshes, complex domain setups, or extensive initialization steps should consider the Sapphire Rapids, V100, or A100 architectures to minimize this overhead. Power Consumption: While GPU acceleration is generally more energy-efficient than CPU-only architectures, the power consumption of the V100 GPU was higher than expected in certain scenarios due to the requirement of dual V100 cards to provide sufficient VRAM. Users should carefully assess VRAM requirements, simulation mesh sizes, and available GPU configurations to select the most energy-efficient setup. Cost: Despite the higher queue charge rates, the speed of GPU acceleration makes it a more cost-effective option compared to CPU-only approaches. The V100 and A100 architectures stand out as the most cost-effective choices for researchers, while the Broadwell architecture is the most cost-effective CPU-only option. The comprehensive nature of this study, covering speed, power, and cost implications, provides valuable insights for researchers and HPC administrators when selecting the appropriate hardware and architecture for their CFD simulations.
Stats
The total simulation time for the external case on the Broadwell architecture was 1192.8 seconds for 50 iterations and 2318.4 seconds for 100 iterations. The total simulation time for the internal case on the Broadwell architecture was 3272.1 seconds for 50 iterations and 6946.0 seconds for 100 iterations. The mean iteration time for the external case on the Broadwell architecture was 23.8 seconds for 50 iterations and 23.1 seconds for 100 iterations. The mean iteration time for the internal case on the Broadwell architecture was 65.4 seconds for 50 iterations and 69.4 seconds for 100 iterations. The power consumption for the external case on the Broadwell architecture was 44.7 Wh for 50 iterations and 86.9 Wh for 100 iterations. The power consumption for the internal case on the Broadwell architecture was 122.7 Wh for 50 iterations and 260.0 Wh for 100 iterations. The service unit cost for the external case on the Broadwell architecture was 6.7 SU for 50 iterations and 12.2 SU for 100 iterations. The service unit cost for the internal case on the Broadwell architecture was 16.9 SU for 50 iterations and 35.0 SU for 100 iterations.
Quotes
"GPU acceleration remains a clearly optimal choice, but the selection of individual GPUs is less so. Users with smaller mesh requirements may see practical benefits such as reduced cost from single GPU use, however larger models benefit significantly from the adoption of larger but more costly, GPU architectures." "Initialization time is a crucial factor for researchers using commercial codes like ANSYS Fluent, where limited access to underlying code makes optimization challenging. The Broadwell and Cascade Lake architectures are notably slower in this regard, prompting researchers initialising complex simulations to opt for architectures such as the Sapphire Rapids, V100, or A100."

Deeper Inquiries

What other factors, beyond speed, power, and cost, should researchers consider when selecting hardware and architectures for their CFD simulations on HPC systems

In addition to speed, power, and cost, researchers should consider several other factors when selecting hardware and architectures for their CFD simulations on HPC systems. One crucial factor is memory capacity and bandwidth. CFD simulations often involve large datasets and complex mesh structures, requiring significant memory resources to handle the computations efficiently. Researchers should ensure that the selected hardware has sufficient memory capacity and bandwidth to support the simulation requirements without causing bottlenecks. Another important consideration is the scalability of the hardware. As simulations scale up in complexity or size, the ability of the hardware to scale with the workload becomes crucial. Researchers should choose architectures that can easily scale up to accommodate larger simulations without sacrificing performance or incurring significant costs. Furthermore, the reliability and stability of the hardware are essential factors to consider. HPC systems are often used for long-running simulations that can span days or even weeks. Hardware failures or instability can lead to significant disruptions and data loss. Therefore, researchers should opt for architectures known for their reliability and stability to ensure uninterrupted simulation runs. Lastly, the compatibility and support for parallel processing and multi-threading are vital considerations. CFD simulations are inherently parallelizable, and architectures that support efficient parallel processing can significantly speed up the computations. Researchers should choose hardware and architectures that are optimized for parallel computing to maximize performance and efficiency in their simulations.

How might the trends observed in this study change for more complex CFD simulations, such as those involving multiphase flows or advanced turbulence models

The trends observed in the study may change for more complex CFD simulations, such as those involving multiphase flows or advanced turbulence models. In multiphase flow simulations, the computational requirements increase significantly due to the interactions between different phases, leading to larger datasets and more complex calculations. This can impact the performance of the hardware, especially in terms of memory usage and processing power. As a result, the speed and power consumption trends observed in simpler simulations may not directly translate to more complex scenarios. Similarly, advanced turbulence models introduce additional computational complexity, requiring more iterations and calculations to achieve convergence. This can affect the iteration speed and total simulation time, potentially altering the performance characteristics of the hardware. The GPU acceleration benefits observed in simpler simulations may still apply to more complex cases, but the extent of the acceleration and the trade-offs between speed, power, and cost may vary. Overall, for more complex CFD simulations, researchers may need to carefully assess the specific requirements of their simulations and tailor their hardware and architecture choices accordingly to achieve optimal performance and efficiency.

What opportunities exist for further optimization and acceleration of the initialization stage of CFD simulations, particularly in commercial software where access to the underlying code is limited

Opportunities for further optimization and acceleration of the initialization stage of CFD simulations, particularly in commercial software with limited access to the underlying code, exist through several strategies. One approach is to streamline the initialization process by automating repetitive tasks and optimizing the sequence of operations. Researchers can develop scripts or workflows that automate mesh import, boundary condition definition, solver settings, and other initialization steps to reduce manual intervention and speed up the process. Additionally, leveraging pre-processing tools and techniques can help optimize the initialization stage. Using advanced mesh generation software that can create high-quality meshes efficiently can significantly reduce the time required for meshing and domain setup. Researchers can also explore adaptive meshing strategies that dynamically refine the mesh based on simulation requirements, improving accuracy and efficiency. Furthermore, exploring parallelization opportunities during initialization can enhance performance. While the initialization stage may not benefit significantly from parallel processing, identifying parallelizable tasks within the initialization process and optimizing them for multi-threading or distributed computing can help speed up overall initialization times. Overall, by combining automation, pre-processing tools, and parallelization strategies, researchers can optimize and accelerate the initialization stage of CFD simulations, even in commercial software environments with limited access to underlying code.
0