Optimization Advice for MPI+OpenMP Hybrid Parallelism in AMReX #4497

paradoxknight1 · 2025-06-10T14:42:24Z

paradoxknight1
Jun 10, 2025

I'm currently evaluating the performance trade-offs between hybrid (MPI+OpenMP) and pure MPI parallelism in AMReX, and the results are quite perplexing.

As I begin working with the AmrLevel interface to develop my solver, initial benchmarks on a dual-socket Xeon Gold 6130 system (32 cores total) reveal that pure MPI consistently outperforms hybrid parallelism. In our Sod shock tube test cases, pure MPI demonstrates a 30% speed advantage over the hybrid approach.

The hybrid configuration was set up as follows:
`
export OMP_NUM_THREADS=16
export OMP_PLACES=cores
export OMP_PROC_BIND=close
mpirun -np 2 --bind-to socket --map-by socket ./main.ex inputs

`
To rule out implementation errors, I benchmarked AMReX's official Advection example and observed even more pronounced differences:

Pure MPI (32 processes): 5.434s per 2000 steps
Hybrid (2 MPI × 16 threads): 10.01s per 2000 steps

It is a quite annoying issue to me.

Looking forward to someone who can give me some advice, really.

zingale · 2025-06-10T16:17:56Z

zingale
Jun 10, 2025
Collaborator

Is your simulation 3D? Also how big are your boxes? In general, using tiling + OpenMP benefits from large boxes (increase max_grid_size and blocking_factor).

7 replies

paradoxknight1 Jun 10, 2025
Author

You can try to experiment with the number of threads. There might be a sweet spot between 1 thread and the maximum number of threads possible.

Is it to say that for a Hybrid parallelism, it may perform better even without using the entire thread from the CPU
Like in my cases, 2 MPI × 8 threads ( Hybrid ) might be better than 32 threads (MPI only) ?

Thanks for your timely reply.

WeiqunZhang Jun 10, 2025
Maintainer

I am saying 4 MPI ranks x 8 OMP threads might be better than both 1 MPI ranks x 32 OMP threads and 32 MPI ranks x 1 OMP thread.

zingale Jun 10, 2025
Collaborator

that's been my experience as well. 4 MPI x 8 OMP has worked well in the past, with boxes that are at least 32**3

rmrsk Jun 10, 2025

What does OpenMP only give you?

Does mpirun --report-bindings show that threads are placed where you want them to be?

And can you profile and figure out which functions are choking?

Maybe Weiqun could tell you if you are reaching into code that is not threaded (e.g., allocating and filling MPI buffers for send/receives, mesh generation, etc).

paradoxknight1 Jun 11, 2025
Author

Here is the latest test result, by using the amrcore interface version of Advection
For Hybrid I test 2 MPI x 16 Threads, binding with socket
export OMP_NUM_THREADS=16 mpirun -n 2 --bind-to core --map-by ppr:1:socket:PE=16 --report-bindings ./main2d.gnu.MPI.OMP.ex inputs

and the calculation time is 278.1662718 s

4 MPI x 8 Threads
export OMP_NUM_THREADS=8 mpirun -n 4 --bind-to core --map-by ppr:2:socket:PE=8 --report-bindings ./main2d.gnu.MPI.OMP.ex inputs

and the calculation time is 275.3203238 s

However for pure mpi tests

The calculation time is 56.08625729 s

May I ask how about the MPI+openMP perform in gerernal.

Here is the inputs file.
inputs.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimization Advice for MPI+OpenMP Hybrid Parallelism in AMReX #4497

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Optimization Advice for MPI+OpenMP Hybrid Parallelism in AMReX #4497

Uh oh!

paradoxknight1 Jun 10, 2025

Replies: 1 comment · 7 replies

Uh oh!

zingale Jun 10, 2025 Collaborator

Uh oh!

paradoxknight1 Jun 10, 2025 Author

Uh oh!

WeiqunZhang Jun 10, 2025 Maintainer

Uh oh!

zingale Jun 10, 2025 Collaborator

Uh oh!

rmrsk Jun 10, 2025

Uh oh!

Uh oh!

paradoxknight1 Jun 11, 2025 Author

paradoxknight1
Jun 10, 2025

Replies: 1 comment 7 replies

zingale
Jun 10, 2025
Collaborator

paradoxknight1 Jun 10, 2025
Author

WeiqunZhang Jun 10, 2025
Maintainer

zingale Jun 10, 2025
Collaborator

paradoxknight1 Jun 11, 2025
Author