Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pruners/rempi

ReMPI (MPI Record-and-Replay)
https://github.com/pruners/rempi

clock-delta-compression debugging debugging-tool mpi record-and-replay reproducibility

Last synced: about 2 months ago
JSON representation

ReMPI (MPI Record-and-Replay)

Awesome Lists containing this project

README

        

[![Build Status](https://travis-ci.org/PRUNERS/ReMPI.svg?branch=master)](https://travis-ci.org/PRUNERS/ReMPI)

ReMPI logo

# Introduction

* ReMPI is a record-and-replay tool for MPI+OpenMP applications written in C/C++ and/or fortra
* In a broad sense, "ReMPI" means a record-and-replay tool for MPI+OpenMP applications
* In a narrow sense, "ReMPI" means MPI record-and-replay and "ReOMP" means OpenMP record-and-replay
* (Optional) ReMPI implements Clock Delta Compression (CDC) for compressing records.

# Quick Start

## 1. Building ReMPI

### From Spack

$ git clone https://github.com/LLNL/spack
$ ./spack/bin/spack install rempi

### From git repository

$ git clone [email protected]:PRUNERS/ReMPI.git
$ cd ReMPI
$ ./autogen.sh
$ ./configure --prefix=
$ make
$ make install

### From tarball

$ tar zxvf ./rempi_xxxxx.tar.bz
$ cd
$ ./configure --prefix=
$ make
$ make install

### Note on building for BG/Q

To build on the IBM BG/Q platform, you will need to add the --with-blugene option and specify the path to zlib with the --with-zlib-static flag. You may also need to specify the MPICC and MPIFC variables. For example:

$ ./configure --prefix= --with-bluegene --with-zlib-static=/usr/local/tools/zlib-1.2.6/ MPICC=/usr/local/tools/compilers/ibm/mpicxx-fastmpi-mpich-312 MPIFC=/usr/local/tools/compilers/ibm/mpif90-fastmpi-mpich-312
$ make
$ make install

## 2. Running with ReMPI
$ cd test/rempi
$ mkdir rempi_record

### Record mode (REMPI_MODE=0)

$ REMPI_MODE=0 REMPI_DIR=./rempi_record LD_PRELOAD=/lib/librempi.so srun(or mpirun) -n 4 ./rempi_test_units matching

For its convenience, ReMPI also provides a wapper script which execute the same command as the above. If you install ReMPI to a custom directory, you need to add "/bin/" path to the PATH environment variable.

$ rempi_record srun(or mpirun) -n 4 ./rempi_test_units matching

ReMPI produces one file per MPI process.


### Replay mode (REMPI_MODE=1)

$ REMPI_MODE=1 REMPI_DIR=./rempi_record LD_PRELOAD=/lib/librempi.so srun(or mpirun) -n 4 ./rempi_test_units matching

For its convenience, ReMPI also provides a wapper script which execute the same command as the above

$ rempi_replay srun(or mpirun) -n 4 ./rempi_test_units matching

"REMPI::: 0: Global validation code: 1939202000" is a hash value computed based on the order of MPI events (e.g., Message receive order, message test results and etc.). If you run this example code several times with REMPI_MODE=0, you will see that this hash value changes from run to run. This means this example code is MPI non-deterministic. Once you run this example code and record MPI events with REMPI_MODE=0, you can reproduce this hash value with REMPI_MODE=1. This means MPI events are reproduced.

## 3. Running other examples

The following example script assumes the resource manager is SLURM and that ReMPI is installed in /usr/local. You must edit the example_x86.sh file othewise.

cd example
sh ./example_x86.sh 16
ls -ltr .rempi # lists record files

## 4. Running with ReOMP
Let us take the program below and follow the steps to compile, run the proram, record and replay.
This example code is in test/reomp/reomp_example.cpp and the seriease of the steps are scripted in test/reomp/build_run_reomp_example.sh

#include
#include
#include
#include

static int reomp_example_omp_critical(int nth)
{
uint64_t i;
volatile int sum;
#pragma omp parallel for private(i)
for (i = 0; i < 10000000L / nth; i++) {
#pragma omp critical
{
sum = sum * omp_get_thread_num() + 1;
}
}
return sum;
}

static int reomp_example_data_race(int nth)
{
uint64_t i;
volatile int sum = 1;
#pragma omp parallel for private(i)
for (i = 0; i < 3000000L / nth ; i++) {
sum += nth;
}
return sum;
}

int main(int argc, char **argv)
{
int nth = atoi(argv[1]);
omp_set_num_threads(nth);
int ret1 = reomp_example_omp_critical(nth);
int ret2 = reomp_example_data_race(nth);
fprintf(stderr, "omp_critical: ret = %15d\n", ret1);
fprintf(stderr, "data_race: ret = %15d\n", ret2);
return 0;
}

First let's compile and run without ReOMP.
Note that two functions, reomp_example_omp_critical and reomp_example_data_race, return non-deterministic values (i.e., sum).
If you run the program several times, you will see the different numerical results from run to run.
In reomp_example_omp_critical, the numerical resutls changes depending on the order of threads entering the critical section.
In reomp_example_data_race, the non-deterministic numerical reuslts are produceds due to data races.

$ clang++ -O3 -fopenmp -o reomp_example_without_reomp reomp_example.cpp
$ ./reomp_example_without_reomp 16 # 16 is the number of threads
omp_critical: ret = 17116
data_race: ret = 191889
$ ./reomp_example_without_reomp 16
omp_critical: ret = -456407940
data_race: ret = 188801

To reproduce the numerical results, compile the program with the ReOMP IR pass shared library.
Now, we can reproduce the numerical reuslt in reomp_example_omp_critical since ReOMP find the critical sections and record the order of threads entering the critical sections.
However, we still see inconsistent numerical results in reomp_example_data_race sicne ReOMP itself cannnot find where the data races occur.

$ clang++ -Xclang -load -Xclang ../../src/reomp/.libs/libreompir.so -L../../src/reomp/.libs/ -lreomp -O3 -fopenmp -o reomp_example_with_reomp reomp_example.cpp
$ export LD_LIBRARY_PATH=../../src/reomp/.libs/
$ REOMP_MODE=0 ./reomp_example_with_reomp 16 # REOMP_MODE=0 means the ReOMP record mode.
omp_critical: ret = -2116977392
data_race: ret = 198769
$ REOMP_MODE=1 ./reomp_example_with_reomp 16 # REOMP_MODE=0 means the ReOMP record mode.
omp_critical: ret = -2116977392
data_race: ret = 187489

ReOMP replys on a data race detector to find data races.
Let's detect the data races with Thread Sanitizer (or Archer).

$ clang++ -g -fomit-frame-pointer -fsanitize=thread -O3 -fopenmp -o reomp_example_with_tsan reomp_example.cpp
$ export 'TSAN_OPTIONS=log_path=reomp_tsan.log history_size=7'
$ ./reomp_example_with_tsan 2

Let's re-compile the probram with ReOMP IR pass and the report file (reomp_tsan.log.xxxxx) from Thread Sanitizer and run.
Now, you will see the consistent numerical resutls from run to run.

$ export TSAN_OPTIONS=log_path=reomp_tsan.log # To let the ReOMP IR pass know where the TSAN report file is.
$ clang++ -Xclang -load -Xclang ../../src/reomp/.libs/libreompir.so -L../../src/reomp/.libs/ -lreomp -L/usr/tce/packages/clang/clang-4.0.0/lib -O3 -fopenmp -o reomp_example_with_reomp_data_race reomp_example.cpp
$ REOMP_MODE=0 ./reomp_example_with_reomp_data_race 16
omp_critical: ret = -1833974251
data_race: ret = 191793
$ REOMP_MODE=1 ./reomp_example_with_reomp_data_race 16
omp_critical: ret = -1833974251
data_race: ret = 191793
$ REOMP_MODE=1 ./reomp_example_with_reomp_data_race 16
omp_critical: ret = -1833974251
data_race: ret = 191793

# Environment variables
## ReMPI
* `REMPI_MODE`: Record mode OR Replay mode
* `0`: Record mode
* `1`: Replay mode
* `REMPI_DIR`: Directory path for record files
* `REMPI_ENCODE`: Encoding mode
* `0`: Simple recording
* `1`: `0` + record format optimization
* `2` and `3`: (Experimental encoding)
* `4`: Clock Delta Compression (only when built with `--enable-cdc` option)
* `5`: Same as `4` (only when built with `--enable-cdc` option)
* `REMPI_GZIP`: Enable gzip compression
* `0`: Disable zlib
* `1`: Enable zlib
* `REMPI_TEST_ID`: Enable Matching Function (MF) Identification
* `0`: Disable MF Identification
* `1`: Enable MF Identification

By default, ReMPI stores record files to the current working directory. If you want to change the record directory (e.g., /tmp), use the REMPI_DIR environment variable.

$ rempi_record REMPI_DIR=/tmp srun(or mpirun) -n 4 ./rempi_test_units matching
$ rempi_replay REMPI_DIR=/tmp srun(or mpirun) -n 4 ./rempi_test_units matching

Record data is all interger values. If you enables gzip compression capability via REMPI_GZIP, you can reduce the record size while a certain runtime overhead due to compression engine.

$ rempi_record REMPI_DIR=/tmp REMPI_GZIP=1 srun(or mpirun) -n 4 ./rempi_test_units matching
$ rempi_replay REMPI_DIR=/tmp REMPI_GZIP=1 srun(or mpirun) -n 4 ./rempi_test_units matching

## ReOMP
* `REOMP_MODE`: Record mode OR Replay mode
* `0` or `record`: Record mode
* `1` or `replay`: Replay mode
* `2` or `diable`: Disable ReOMP (Run your applicaiton with instrumented binary but ReOMP doest not record adn replay anything)
* `REOMP_DIR`: Directory path for record files (Default is current directory)
* `REOMP_METHOD`: Record-and-Replay method
* `0`: Distributed epoch reocrding (default)
* `1`: Distributed clock recording
* `2`: Serialized thread ID recording

# Non-determinism that ReMPI records and relays
ReMPI record and replay results of following MPI functions.

### MPI: Blocking Receive

* MPI_Recv

### MPI: Message Completion Wait/Test

* MPI_{Wait|Waitany|Waitsome|Waitall}
* MPI_{Test|Testany|Testsome|Testall}

In current ReMPI, MPI_Request must be initialized by following "Supported" MPI functions. Wait/Test Message Completion functions using MPI_Request initializaed by "Unsupported" MPI functions are not recorded and replayed (Unsupported MPI functions will be supporeted in future).

* Supported
* MPI_Irecv
* MPI_{Isend|Ibsend|Irsend|Issend}
* Unsupported
* MPI_Recv_init
* MPI_{Send|Ssend|Rsend|Bsend}_init
* MPI_{Start|Startall}
* All non-blocking collectives (e.g., MPI_Ibarrier)

### MPI: Message Arrival Probe

* MPI_{Probe|Iprobe}

### MPI: Other sources of non-determinism

Current ReMPI version record and replay only MPI and does not record and repaly other sources of non-determinism suca as OpenMP and other non-deterministic libc functions (e.g., gettimeofday(), clock() and etc.).

### OpenMP:
ReOMP records and replays
* OpenMP clauses
* Critical Section (#omp critical)
* Reduction (#omp reduction)
* Master (#omp master)
* Single (#omp single)
* OpenMP runtime
* omp_set_lock() and omp_unset_lock()
* omp_set_nest_lock() and omp_unset_nest_lock()
* Atomic instructions
* Atomic load/store
* Atomic operations (cmpxchg and atomicrmw)
* Data-racy load/store instructions (If TSAN data-race report files are provided when compiling)

# Using ReMPI with TotalView

Since ReMPI is implemented via a PMPI wrapper, ReMPI works with Totalvew (Parallel debugger). The common use case is that you first record a buggy behavior in ReMPI record mode without TotalView and then replay this buggy behavior with TotalView in ReMPI replay mode. There are two methods to use ReMPI with TotalView.

* Command Line Options: http://docs.roguewave.com/codedynamics/2017.0/html/index.html#page/TotalViewLH/TotalViewCommandLineOptions.html

### Method 1: Command line

You can simply launch the TotalVew GUI with the "totalview -args" command. (LD_PRELOAD must be set thorught a TotalView command line option: -env variable=value)

$ REMPI_MODE=1 REMPI_DIR=./rempi_record totalview -env LD_PRELOAD=/lib/librempi.so -args srun(or mpirun) -n 4 ./rempi_test_units matching

or

$ export REMPI_MODE=1
$ export REMPI_DIR=./rempi_record
$ totalview -env LD_PRELOAD=/lib/librempi.so -args srun(or mpirun) -n 4 ./rempi_test_units matching

For its convenience, ReMPI provides a wapper script to lunch Totaiveiw with ReMPI.

Firs, record a particular execution that you want to diagnose with Totaiview

$ rempi_record srun -n 4 ./rempi_test_units matching

Then, diagnose this recorded execution with Totalview under ReMPI replay

$ rempi_replay totalview -args srun -n 4 ./rempi_test_units matching

### Method 2: GUI

You can also set the REMPI_MODE, REMPI_DIR and LD_PRELOAD variable after launching TotalView.
(Step 0) Record a particular execution that you want to diagnose with Totalview
(Step 1) Run your application with TotalView

$ REMPI_MODE=1 totalview -args srun(or mpirun) -n 4 ./rempi_test_units matching

(Step 2) Select [Process] => [Startup Parameters] in the GUI menu, and then select [Arguments] tab

(Step 3) Specify the environment variables in the "Environment variables" textbox (One environment variable per line)

LD_PRELOAD=/lib/librempi.so

(Step 4) Press "Run" button to execute

# Configuration Options

For more details, run ./configure -h

* `--enable-cdc`: (Optional) enables CDC (clock delta compression), and output librempix.a and .so. When CDC is enabled, ReMPI requires MPI3 and below two software
* `--with-stack-pmpi`: (Required when `--enable-cdc` is specified) path to stack_pmpi directory (STACKP)
* `--with-clmpi`: (Required when `--enable-cdc` is specified) path to CLMPI directory
* `--with-bluegene`: (Required in BG/Q) build codes with static library for BG/Q system
* `--with-zlib-static`: (Required in BG/Q) path to installation directory for libz.a

When the `--enable-cdc` option is specified, ReMPI require dependent software below:

* STACKP: A static MPI tool enabling to run multiple PMPI tools.
* https://github.com/PRUNER/StackP.git
* CLMPI: A PMPI tool for piggybacking Lamport clocks.
* https://github.com/PRUNER/CLMPI.git

# References

* Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, and Martin Schulz. 2015. Clock delta compression for scalable order-replay of non-deterministic parallel applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, New York, NY, USA, , Article 62 , 12 pages. DOI=http://dx.doi.org/10.1145/2807591.2807642