{"id":19981510,"url":"https://github.com/hunsa/reprompi","last_synced_at":"2025-05-04T05:32:08.256Z","repository":{"id":207281566,"uuid":"697477715","full_name":"hunsa/reprompi","owner":"hunsa","description":"ReproMPI Benchmark for MPI Collective","archived":false,"fork":false,"pushed_at":"2025-04-18T06:51:21.000Z","size":1712,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-18T08:36:47.661Z","etag":null,"topics":["benchmarking","clock-synchronization","collectives","mpi"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hunsa.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-27T20:09:25.000Z","updated_at":"2025-04-18T06:51:24.000Z","dependencies_parsed_at":"2023-11-15T01:41:26.761Z","dependency_job_id":"54214e06-5f32-412d-98f1-23f6da664703","html_url":"https://github.com/hunsa/reprompi","commit_stats":null,"previous_names":["hunsa/reprompi"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hunsa%2Freprompi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hunsa%2Freprompi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hunsa%2Freprompi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hunsa%2Freprompi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hunsa","download_url":"https://codeload.github.com/hunsa/reprompi/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252293082,"owners_count":21724960,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmarking","clock-synchronization","collectives","mpi"],"created_at":"2024-11-13T03:48:40.102Z","updated_at":"2025-05-04T05:32:07.307Z","avatar_url":"https://github.com/hunsa.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ReproMPI Benchmark (Development Version)\n\n\n## Introduction\n\nThe ReproMPI Benchmark is a tool designed to accurately measure the\nrun-time of MPI blocking collective operations. It provides multiple\nprocess synchronization methods and a flexible mechanism for\npredicting the number of measurements that are sufficient to obtain\nstatistically sound results.\n\n# References \n\n1. Sascha Hunold, Alexandra Carpen-Amarie:\n   On the Impact of Synchronizing Clocks and Processes on Benchmarking MPI Collectives. EuroMPI 2015: 8:1-8:10\n2. Sascha Hunold, Alexandra Carpen-Amarie, Jesper Larsson Träff:\n   Reproducible MPI Micro-Benchmarking Isn't As Easy As You Think. EuroMPI/ASIA 2014: 69\n3. Sascha Hunold, Alexandra Carpen-Amarie:\n   Reproducible MPI Benchmarking is Still Not as Easy as You Think. IEEE Trans. Parallel Distributed Syst. 27(12): 3617-3630 (2016)\n4. Sascha Hunold, Alexandra Carpen-Amarie:\n   Hierarchical Clock Synchronization in MPI. CLUSTER 2018: 325-336\n5. Sascha Hunold, Alexandra Carpen-Amarie:\n   Autotuning MPI Collectives using Performance Guidelines. HPC Asia 2018: 64-74\n6. Joseph Schuchart, Sascha Hunold, George Bosilca:\n   Synchronizing MPI Processes in Space and Time. EuroMPI 2023: 7:1-7:11\n   \n## Components\n\n- `mpibenchmark`: actual MPI benchmark for collectives\n- [`pgchecker`](https://github.com/hunsa/reprompi/tree/main/src/pgcheck/): performance guideline checker\n\n## Installation\n\n- Prerequisites\n  - an MPI library \n  - CMake (version \u003e= 3.0)  \n  - GSL libraries \n\n## Basic installation\n\n```\n  cd $BENCHMARK_PATH\n  ./cmake .\n  make\n```\n\nFor specific configuration options check the *Benchmark Configuration* section.\n\n## Running the ReproMPI Benchmark\n\nThe ReproMPI code is designed to serve two specific purposes:\n\n## Benchmarking of MPI collective calls\nThe most common usage scenario of the benchmark is to specify an MPI\ncollective function to be benchmarked, a (list of) message sizes and\nthe *number of measurement repetitions* for each test, as in the\nfollowing example.\n\n```\nmpirun -np 4 ./bin/mpibenchmark --calls-list=MPI_Bcast,MPI_Allgather \n             --msizes-list=8,1024,2048  --nrep=10\n```\n\n\n\n## Command-line Options\n\n### Common Options\n\n  - `-h` print help\n  - `-v` print run-times measured for each process\n  - `--msizes-list`\u003cvalues\u003e= list of comma-separated message sizes in\n    Bytes, e.g., `--msizes-list=10,1024`\n  - `--msize-interval=min=\u003cmin\u003e,max=\u003cmax\u003e,step=\u003cstep\u003e` list of power\n    of 2 message sizes as an interval between $2^{min}$ and $2^{max}$,\n    with $2^{step}$ distance between values, e.g., \n    `--msize-interval=min=1,max=4,step=1`\n  - `--calls-list=\u003cargs\u003e` list of comma-separated MPI calls to be\n    benchmarked, e.g., `--calls-list=MPI_Bcast,MPI_Allgather`\n  - `--root-proc=\u003cprocess_id\u003e` root node for collective operations     \n  - `--operation=\u003cmpi_op\u003e` MPI operation applied by collective\n    operations (where applicable), e.g., `--operation=MPI_BOR`.\n    \n    Supported operations: MPI_BOR, MPI_BAND, MPI_LOR, MPI_LAND,\n    MPI_MIN, MPI_MAX, MPI_SUM, MPI_PROD \n  - `--datatype=\u003cmpi_type\u003e` MPI datatype used by collective\n    operations, e.g., `--datatype=MPI_CHAR`.\n\n    Supported datatypes: `MPI_CHAR`, `MPI_INT`, `MPI_FLOAT`, `MPI_DOUBLE`\n  - `--shuffle-jobs` shuffle experiments before running the benchmark\n  - `--params=k1:v1,k2:v2` list of comma-separated =key:value= pairs\n    to be printed in the benchmark output.\n  - `-f | --input-file=\u003cpath\u003e` input file containing the list of\n    benchmarking jobs (tuples of MPI function, message size, number of\n    repetitions). It replaces all the other common options.\n  \n  \n### Options Related to the Window-based Synchronization\n\n  - `--window-size=\u003cwin\u003e` window size in microseconds for Window-based synchronization\n\n\n### Specific Options for the ReproMPI Benchmark\n\n  - `--nrep=\u003cnrep\u003e` set number of experiment repetitions\n  - `--summary=\u003cargs\u003e` list of comma-separated data summarizing\n    methods (mean, median, min, max, var, stddev), e.g., `--summary=mean,max`\n\n\n## Supported Collective Operations:\n### MPI Collectives\n\n  - `MPI_Allgather`\n  - `MPI_Allreduce`\n  - `MPI_Alltoall`\n  - `MPI_Barrier`\n  - `MPI_Bcast`\n  - `MPI_Exscan`\n  - `MPI_Gather`\n  - `MPI_Reduce`\n  - `MPI_Reduce_scatter`\n  - `MPI_Reduce_scatter_block`\n  - `MPI_Scan`\n  - `MPI_Scatter`\n\n### Mockup Functions of Various MPI Collectives\n\n| **MPI_Allgather** | **MPI_Allreduce**            | **MPI_Alltoall** | **MPI_Bcast**     | **MPI_Gather** | **MPI_Reduce**            | **MPI_Reduce_scatter_block** | **MPI_Scan**       | **MPI_Scatter** |\n|-------------------|------------------------------|------------------|-------------------|----------------|---------------------------|------------------------------|--------------------|-----------------|\n| Default           | Default                      | Default          | Default           | Default        | Default                   | Default                      | Default            | Default         |\n| Allgatherv        | Reduce+Bcast                 | Alltoallv        | Allgatherv        | Allgather      | Allreduce                 | Reduce+Scatter               | Exscan+Reducelocal | Bcast           |\n| Allreduce         | Reducescatterblock+Allgather | Lane             | Scatter+Allgather | Gatherv        | Reducescatterblock+Gather | Reducescatter                | Lane               | Scatterv        |\n| Alltoall          | Reducescatter+Allgatherv     |                  | Lane              | Reduce         | Reducescatter+Gatherv     | Allreduce                    | Hier               | Lane            |\n| Gather+Bcast      | Lane                         |                  | Hier              | Lane           | Reducescatter             | Hier                         |                    | Hier            |\n| Lane              | Hier                         |                  |                   | Hier           | Lane                      | Lane                         |                    |                 |\n| Lane Zero         |                              |                  |                   |                | Hier                      |                              |                    |                 |\n| Hier                          |                    |                 |\n\n    \n\n## Process Synchronization Methods\n\n### MPI_Barrier\nThis is the default synchronization method enabled for the benchmark.\n\n### Dissemination Barrier\nTo benchmark collective operations acorss multiple MPI libraries using\nthe same barrier implementation, the benchmark provides a\ndissemination barrier that can replace the default MPI_Barrier to\nsynchronize processes.\n\nTo enable the dissemination barrier, the following flag has to be set\nbefore compiling the benchmark (e.g., using the =ccmake= command).\n\n```\nENABLE_BENCHMARK_BARRIER\n```\n\nBoth barrier-based synchronization methods can alternatively use a\ndouble barrier before each measurement.\n\n```\nENABLE_DOUBLE_BARRIER\n```\n\n\n### Window-based Synchronization\n\nThe ReproMPI benchmark implements a window-based process\nsynchronization mechanism, which estimates the clock offset/drift of\neach process relative to a reference process and then uses the\nobtained global clocks to synchronize processes before each\nmeasurement and to compute run-times.\n\n\n### Timing procedure\n  \n  The MPI operation run-time is computed in a different manner\n  depending on the selected clock synchronization method. If global\n  clocks are available, the run-times are computed as the difference\n  between the largest exit time and the first start time among all\n  processes.\n\n  If a barrier-based synchronization is used, the run-time of an MPI\n  call is computed as the largest local run-time across all processes.\n\n  However, the timing proceduce that relies on global clocks can be\n  used in combination with a barrier-based synchronization when the\n  following flag is enabled:\n\n\n### Clock resolution\n\nThe =MPI_Wtime= cll is used by default to obtain the current time.\nTo obtain accurate measurements of short time intervals, the benchmark\ncan rely on the high resolution =RDTSC/RDTSCP= instructions (if they are\navailable on the test machines) by setting on of the following flags:\n```\nENABLE_RDTSC\nENABLE_RDTSCP\n```\n\nAdditionally, setting the clock frequency of the CPU is required to\nobtain accurate measurements:\n```\nFREQUENCY_MHZ                    2300\n```\n\nThe clock frequency can also be automatically estimated (as done by\nthe NetGauge tool) by enabling the following variable:\n```\nCALIBRATE_RDTSC\n```\n\nHowever, this method reduces the results accuracy and we advise to\nmanually set the highest CPU frequency instead. More details about\nthe usage of =RDTSC=-based timers can be found in our research\nreport.\n\n## List of Compilation Flags\n\nThis is the full list of compilation flags that can be used to control\nall the previously detailed configuration parameters.\n\n```\n CALIBRATE_RDTSC                  OFF   \n COMPILE_BENCH_TESTS              OFF                 \n COMPILE_SANITY_CHECK_TESTS       OFF               \n ENABLE_BENCHMARK_BARRIER         OFF             \n ENABLE_DOUBLE_BARRIER            OFF             \n ENABLE_GLOBAL_TIMES              OFF             \n ENABLE_LOGP_SYNC                 OFF             \n ENABLE_RDTSC                     OFF             \n ENABLE_RDTSCP                    OFF           \n ENABLE_WINDOWSYNC_HCA            OFF            \n ENABLE_WINDOWSYNC_JK             OFF        \n ENABLE_WINDOWSYNC_SK             OFF      \n FREQUENCY_MHZ                    2300    \n```\n\n## Clock Synchronization Algorithms\n\n### HCA [1]\n\n### HCA2 [1]\n\n### HCA3 [4]\n\n### Topo1 [4]\n\n### Topo2 [4]\n\n- two-level hierarchical clock-sync\n  - top level for sync between nodes\n  - bottom level on compute node\n- default\n  - top: HCA3\n  - bottom: ClockPropagation\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhunsa%2Freprompi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhunsa%2Freprompi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhunsa%2Freprompi/lists"}