toplogo
Sign In

Performance Analysis of OpenMC on Intel, NVIDIA, and AMD GPUs


Core Concepts
OpenMC demonstrates performance portability across Intel, NVIDIA, and AMD GPUs using the OpenMP target offloading model.
Abstract
OpenMC is a Monte Carlo neutral particle transport application. Performance portability achieved on Frontier, Polaris, and Aurora supercomputers. Historical context analyzed for legacy GPU and CPU architectures. GPU-specific optimizations include event-based parallelism and sorting techniques. Single node and supercomputer performance results discussed. Weak scaling studies show high efficiency on all systems. Comparison with legacy architectures reveals widening performance gap between CPU and GPU over time.
Stats
"The largest run in this paper (2048 nodes of the Frontier supercomputer using MI250X GPUs), an inactive batch rate of 1.65 billion particles/sec was observed." "On Intel GPUs, we pinned the number of particles in-flight to 1.125 million for all of our runs."
Quotes
"The world’s first exascale supercomputer, Oak Ridge National Laboratory’s “Frontier”, features MI250X GPUs from AMD." "OpenMC's theoretical portability advantage is clear."

Deeper Inquiries

How does the performance of OpenMC on CPUs compare to other state-of-the-art MC transport applications

The performance of OpenMC on CPUs compared to other state-of-the-art MC transport applications is quite competitive. In the context provided, OpenMC's CPU performance was found to be within about 35% in performance of the fastest CPU code tested. This indicates that OpenMC's CPU version compares favorably with other established codes like Serpent, MCNP, and SCONE. The comparison was based on a standard depleted pincell problem with 251 fuel nuclides and compiled using optimization flags enabled with the GNU compiler on a dual-socket Xeon Platinum node.

What are the implications of the sudden diversification in the GPU market for scientific simulation application development

The sudden diversification in the GPU market has significant implications for scientific simulation application development. Previously dominated by NVIDIA GPUs due to their CUDA programming model, the introduction of AMD MI250X GPUs at Oak Ridge National Laboratory's "Frontier" and Intel PVC Max 1550 GPUs at Argonne National Laboratory's "Aurora" marks a shift in GPU usage for supercomputers. This diversification poses challenges as NVIDIA's proprietary CUDA model is not universally supported across all systems, leading to portability issues for existing GPU codes. Various portable GPU programming models like OpenMP target offloading, HIP, SYCL/DPC++, Kokkos, among others are being explored as alternatives to CUDA. However, these models vary in support across vendors and maturity levels compared to CUDA. The need for legacy CPU-based applications to be ported and optimized for diverse GPU architectures has become crucial given this market shift towards multiple GPU vendors.

How might advancements in GPU technology impact future simulations beyond what is currently achievable

Advancements in GPU technology have profound implications for future simulations beyond current capabilities. The results from running OpenMC on modern leadership class supercomputing systems demonstrate substantial speedups achieved by utilizing GPUs over traditional CPUs. For instance, single Aurora or Frontier GPU nodes outperformed high-end dual-socket CPU nodes significantly. With continuous improvements in GPU architecture design and computational power doubling every few years (as seen between NVIDIA P100 and A100), there is immense potential for enhancing simulation fidelity and complexity while reducing computation time drastically. Future simulations could leverage advanced features like probability tables, S(α , β) thermal scattering methods along with efficient cross-section lookup techniques offered by multipole representations or Faddeeva function approximations implemented within Monte Carlo particle transport codes like OpenMC. These advancements enable researchers to tackle more complex problems efficiently—such as full core reactor simulations requiring hundreds of thousands of unique material regions—with unprecedented speed and accuracy that were previously unattainable solely through CPU-based computations.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star