https://github.com/sstsimulator/ember
Ember Communication Patterns
https://github.com/sstsimulator/ember
benchmarks communication motifs mpi openshmem patterns shmem simulation snl-mini-apps
Last synced: 4 months ago
JSON representation
Ember Communication Patterns
- Host: GitHub
- URL: https://github.com/sstsimulator/ember
- Owner: sstsimulator
- License: bsd-3-clause
- Created: 2017-10-24T00:54:31.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2024-07-17T15:49:12.000Z (almost 2 years ago)
- Last Synced: 2025-06-18T22:34:34.245Z (about 1 year ago)
- Topics: benchmarks, communication, motifs, mpi, openshmem, patterns, shmem, simulation, snl-mini-apps
- Language: C
- Homepage:
- Size: 84 KB
- Stars: 11
- Watchers: 19
- Forks: 7
- Open Issues: 1
-
Metadata Files:
- Readme: README.MPI.halo3d
- License: LICENSE
Awesome Lists containing this project
README
Communication Motif: Halo3D
Description:
Nearest neighbor communications are *very* common in scalable DOE
applications. In this pattern, each MPI rank communications with ranks
that are adjacent to it in each Cartesian dimension. The "halo"
exchanged is the data on each face. The Halo3D pattern included in Ember
is the simplest representation of this communication approach and
represents codes which are typically structured (i.e. have well defined
problem dimensions and that are regular).
In most DOE implementations (although not all) of Halo3D, an
MPI_Allreduce operation is executed every n iterations (in some cases
n=1) which executes either a sum, min or max over the global problem
domain. This is *not* included in the Ember implementation so we have
broad applicability.
Parameters for the Halo3D Motif:
mpirun ./halo3d \
-nx \
-ny \
-nz \
-pex \
-pey \
-pez
-iterations \
-vars \
-sleep
Example: 256 rank run with a local (per rank) data grid of 20x20x20
mpirun -n 256 ./halo3d \
-nx 20 \
-ny 20 \
-nz 20 \
-pex 8 \
-pey 8 \
-pez 4 \
-iterations 100 \
-vars 8 \
-sleep 2000
Output:
Example:
# Time KBytesXchng/Rank-Max MB/S/Rank
0.013865 150.0000 10818.6126
When run the motif will complete reporting the time taken, the number of
KB send/received by a rank in the middle of the processor grid (ranks
around the edge will have lower communication volume because on some
faces they have no neighbors). A benchmarked bandwidth is reported for
the rank in the middle of the processor grid.