https://github.com/tacc/amask

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/tacc/amask
Owner: TACC
License: mit
Created: 2017-05-01T16:37:34.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2024-07-16T12:25:34.000Z (11 months ago)
Last Synced: 2024-07-16T15:58:04.741Z (11 months ago)
Language: C
Size: 1.03 MB
Stars: 8
Watchers: 10
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README
- License: License

Awesome Lists containing this project

README

The Fortran API is not ready yet.

Amask was originally designed just to display the affinity (mask) of processes that
are launched by an MPI executable launcher (mpirun, mpiexec, TACC ibrun, etc.).
When maskeraid is built, it creates amask_mpi. This MPI executable
is then substituted for mpi application executable of the researcher in a test run
to discover how the launcher would have distributed the application's tasks (ranks).
In a job script, the user would simple replace the application executable name
with amask_mpi (you may need to include a path the executable location), as
shown below:

#!/bin/bash
#SLURM -n 16 -N 1
...
#Batch Script for TACC machine
...
#ibrun ./my_mpi_exec
ibrun ./amask_mpi

OUTPUT 1
rank | 0 | 10 |
0000 0---------------
0001 -1--------------
0002 --2-------------
0003 ---3------------
0004 ----4-----------
0005 -----5----------
0006 ------6---------
0007 -------7--------
0008 --------8-------
0009 ---------9------
0010 ----------0-----
0011 -----------1----
0012 ------------2---
0013 -------------3--
0014 --------------4-
0015 ---------------5

The whereami puts a load on the ranks for 10 seconds, so that top can be run on
a node to observe the core occupation. (Execute "top", and then press the
1 key to change the display to show the percentage load on each core. You can
reset the load time by setting setting the MASKERAID_LOAD_SECONDS environment
variable to a positive integer (units are seconds).

The resulting output for Stampede in Output 1. shows a matrix with rows labeled by the ranks
of a amask_mpi MPI tasks for the above launch, the columns represent CPU_ids
(core id) with positions representing the the CPU_ids from 0 to ntasks-1, beginning
with the first column of the matrix.

A non-hyphen character in the matrix means that
the mpi rank (row value) is allowed to execute in the CPU_id (column value). In the
output below
rank 0 is "mapped" to core (CPU_id) 0, rank 1 is mapped to
core (CPU_id) 1, etc., so as to have each rank execute on only one core (CPU_id).

To make it easy to determine the CPU_id value in the matrix, the first digit of
CPU_id (core) number is printed in the matrix. For CPU_ids larger than 9, the
The "10s" position value can be read from the column header demarking the set
by bars (|). For instance the second set of values {0, 1, 2, 3, 4, 5} along the
diagonl represent CPU_id's { 10, 11, 12, 13, 14, and 15}-- the "10" in the header
"| 10 |" should be added to the unit digits within the matrix to get the
CPU_id value.

Put the output HERE....

------------------------------------------------------------------------------------

Performance might be improved by pinning each OPenMP thread to a single core. This can
be done in several ways:
1.) Masking commands for each thread can be inserted after
the threads are generated in an OpenMP parallel region.
2.) KMP variables can be set for intel-compiled code with OpenMP directives.
3.) The 4.0 OpenMP Affinity directives and environmnet variable can be
use to map threads to "places".

In hybrid code is may be necessary to turn off MPI affinity settings when using OpenMP.
With the MVAPICH2 compiler, this is done the in the tacc_affinity scripts.

----------------------------------------------------------------

Instrumenting Your Own Application. FORTRAN IS NOT READY YET

Important elements of Amask were turned into an library, so that users
could instrument there own applications to display the masks of MPI tasks
and OpenMP threads. Instrumentation is simple, just add a single call to
an MPI or OpenMP code after initialization or the parallel directive, respectively.
Hybrid codes require 2 API calls -- one in the PURE MPI region, and another
in the hybrid region (with an OpenMP parallel region). Simple cases are shown below:

MPI code:

MPI_Init(NULL,NULL); call MPI_Init()
ierr = amask_mpi(); ierr = amask_mpi()
... ...

OMP code:
#pragma omp parallel private(ierr) !$omp parallel private(ierr)
ierr = amask_omp(); ierr = amask_omp()
... ...

HYBrid code:

MPI_Init_thread(...); call MPI_Init_thread()
ierr = amask_mpi(); ierr = amask_mpi()
... ...
#pragma omp parallel private(ierr) !$omp parallel private(ierr)
ierr = amask_hybrid(); ierr = amask_hybrid()
... ...

There are other utilities in the library:

load_cpu_nsec( nsec ) --- Loads up a process/thread for nsec seconds (int operations)
gtod_timer() --- Easy to use Get Time of Day clock
map_to_procid(proc_id) --- Sets process/thread map to a single core (proc-id).

See code for details.
tsc() -- Time Stamp Counter

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tacc/amask

Awesome Lists containing this project

README