Core Concepts
This work presents a portable and highly parallel implementation of the Material Point Method (MPM) for simulating compressible gas dynamics, with a focus on achieving performance portability across different hardware architectures.
Abstract
The authors present a portable and highly parallel implementation of the Material Point Method (MPM) for simulating compressible gas dynamics. The key highlights are:
The implementation aims to achieve a good compromise between portability and efficiency by using the Thrust C++ template library, which allows the code to be compiled and executed on a variety of hardware architectures, including NVIDIA GPUs, AMD GPUs, and multi-core CPUs.
The algorithm is designed to take advantage of the data locality and fine-grained parallelism offered by modern hardware accelerators, such as GPUs. Specific optimizations include postponing particle movement to the end of the time loop, avoiding the need to store basis function values at previous particle positions.
The implementation is evaluated on several benchmark test cases, including supersonic flow past solid obstacles, transonic flow past an aerofoil, and the Taylor-Green vortex problem. The results demonstrate the ability of the MPM approach to accurately capture the main flow features, such as shock waves and flow separation.
A detailed performance analysis is provided, showing the scalability of the implementation on GPUs and its portability to multi-core CPUs. The profiling results highlight the importance of data locality and the impact of particle reordering on the performance of the key computational kernels.
The authors discuss the trade-offs between portability and optimization, and propose an alternative algorithm to avoid the need for atomic operations in the Particle-to-Grid (P2G) kernel, which can be a performance bottleneck on some architectures.
Overall, the work presents a significant step towards the realization of a monolithic MPM solver for Fluid-Structure Interaction (FSI) problems at all Mach numbers up to the supersonic regime, with a focus on achieving performance portability across diverse hardware platforms.
Stats
The simulation results are presented in non-dimensional units, with the following key parameters:
Specific heat ratio: γ = 1.4 (for a bi-atomic gas)
Unperturbed speed of sound: cs,∞= 1
Unperturbed Mach number: M∞= v∞
The time step is chosen to satisfy the CFL condition:
∆t ≤ 1/2 * hmin / (vmax + cs,max)
Quotes
"The recent evolution of software and hardware technologies is leading to a renewed computational interest in Particle-In-Cell (PIC) methods such as the Material Point Method (MPM)."
"Notwithstanding its Lagrangian character, MPM also employs a background Cartesian grid to compute differential quantities and solve the motion equation, thus mediating the particle-particle interactions, and taking advantage of both Eulerian and Lagrangian approaches."
"One of NVIDIA's solutions for performance portability involves its own C++ compiler nvc++, using which one rewrites algorithm steps relying on C++ Standard Template Library (STL), specifying a parallel execution policy."