{"id":19706081,"url":"https://github.com/llnl/mpibench","last_synced_at":"2025-04-29T16:32:44.959Z","repository":{"id":66082974,"uuid":"103687607","full_name":"LLNL/mpiBench","owner":"LLNL","description":"MPI benchmark to test and measure collective performance","archived":false,"fork":false,"pushed_at":"2021-06-29T01:21:28.000Z","size":21,"stargazers_count":50,"open_issues_count":2,"forks_count":19,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-06-15T00:02:34.706Z","etag":null,"topics":["mpi","performance"],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LLNL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-09-15T18:13:11.000Z","updated_at":"2024-06-03T06:14:25.000Z","dependencies_parsed_at":"2023-02-21T03:15:41.899Z","dependency_job_id":null,"html_url":"https://github.com/LLNL/mpiBench","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2FmpiBench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2FmpiBench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2FmpiBench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2FmpiBench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LLNL","download_url":"https://codeload.github.com/LLNL/mpiBench/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224179671,"owners_count":17269103,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mpi","performance"],"created_at":"2024-11-11T21:33:44.663Z","updated_at":"2024-11-11T21:33:47.277Z","avatar_url":"https://github.com/LLNL.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mpiBench\nTimes MPI collectives over a series of message sizes\n\n# What is mpiBench?\n\nmpiBench.c\n\nThis program measures MPI collective performance for a range of\nmessage sizes.  The user may specify:\n- the collective to perform,\n- the message size limits,\n- the number of iterations to perform,\n- the maximum memory a process may allocate for MPI buffers,\n- the maximum time permitted for a given test,\n- and the number of Cartesian dimensions to divide processes into.\n\nThe default behavior of mpiBench will run from 0-256K byte messages\nfor all supported collectives on MPI_COMM_WORLD with a 1G buffer\nlimit.  Each test will execute as many iterations as it can to fit\nwithin a default time limit of 50000 usecs.\n\ncrunch_mpiBench\n\nThis is a perl script which can be used to filter data and generate\nreports from mpiBench output files.  It can merge data from\nmultiple mpiBench output files into a single report.  It can also\nfilter output to a subset of collectives.  By default, it reports\nthe operation duration time (i.e., how long the collective took to\ncomplete).  For some collectives, it can also report the effective\nbandwidth.  If provided two datasets, it computes a speedup factor.\n\n# What is measured\n\nmpiBench measures the total time required to iterate through a loop\nof back-to-back invocations of the same collective (optionally\nseparated by a barrier), and divides by the number of iterations.\nIn other words the timing kernel looks like the following:\n\n    time_start = timer();\n    for (i=0 ; i \u003c iterations; i++) {\n      collective(msg_size);\n      barrier();\n    }\n    time_end = timer();\n    time = (time_end - time_start) / iterations;\n \nEach participating MPI process performs this measurement and all\nreport their times.  It is the average, minimum, and maximum across\nthis set of times which is reported.\n\nBefore the timing kernel is started, the collective is invoked once to\nprime it, since the initial call may be subject to overhead that later\ncalls are not.  Then, the collective is timed across a small set of\niterations (~5) to get a rough estimate for the time required for a\nsingle invocation.  If the user specifies a time limit using the -t\noption, this esitmate is used to reduce the number of iterations made\nin the timing kernel loop, as necessary, so it may executed within the\ntime limit.\n\n\n# Basic Usage\n\nBuild:\n\n    make\n\n  Run:\n\n    srun -n \u003cprocs\u003e ./mpiBench \u003e output.txt\n\n  Analyze:\n\n    crunch_mpiBench output.txt\n\n# Build Instructions\n\nThere are several make targets available:\n- make       -- simple build\n- make nobar -- build without barriers between consecutive collective invocations\n- make debug -- build with \"-g -O0\" for debugging purposes\n- make clean -- clean the build\n\nIf you'd like to build manually without the makefiles, there are some\ncompile-time options that you should be aware of:\n\n  -D NO_BARRIER       - drop barrier between consecutive collective\n                        invocations\n  -D USE_GETTIMEOFDAY - use gettimeofday() instead of MPI_Wtime() for\n                        timing info\n\n# Usage Syntax\n\n    Usage:  mpiBench [options] [operations]\n  \n    Options:\n      -b \u003cbyte\u003e  Beginning message size in bytes (default 0)\n      -e \u003cbyte\u003e  Ending message size in bytes (default 1K)\n      -i \u003citrs\u003e  Maximum number of iterations for a single test\n                 (default 1000)\n      -m \u003cbyte\u003e  Process memory buffer limit (send+recv) in bytes\n                 (default 1G)\n      -t \u003cusec\u003e  Time limit for any single test in microseconds\n                 (default 0 = infinity)\n      -d \u003cndim\u003e  Number of dimensions to split processes in\n                 (default 0 = MPI_COMM_WORLD only)\n      -c         Check receive buffer for expected data in last\n                 interation (default disabled)\n      -C         Check receive buffer for expected data every\n                 iteration (default disabled)\n      -h         Print this help screen and exit\n      where \u003cbyte\u003e = [0-9]+[KMG], e.g., 32K or 64M\n  \n    Operations:\n      Barrier\n      Bcast\n      Alltoall, Alltoallv\n      Allgather, Allgatherv\n      Gather, Gatherv\n      Scatter\n      Allreduce\n      Reduce\n\n# Examples\n\n## mpiBench\n\nRun the default set of tests:\n\n    srun -n2 -ppdebug mpiBench\n\nRun the default message size range and iteration count for Alltoall, Allreduce, and Barrier:\n\n    srun -n2 -ppdebug mpiBench Alltoall Allreduce Barrier\n\nRun from 32-256 bytes and time across 100 iterations of Alltoall:\n\n    srun -n2 -ppdebug mpiBench -b 32 -e 256 -i 100 Alltoall\n\nRun from 0-2K bytes and default iteration count for Gather, but\nreduce the iteration count, as necessary, so each message size\ntest finishes within 100,000 usecs:\n\n    srun -n2 -ppdebug mpiBench -e 2K -t 100000 Gather\n\n## crunch_mpiBench\n\nShow data for just Alltoall:\n\n    crunch_mpiBench -op Alltoall out.txt\n\nMerge data from several files into a single report:\n\n    crunch_mpiBench out1.txt out2.txt out3.txt\n\nDisplay effective bandwidth for Allgather and Alltoall:\n\n    crunch_mpiBench -bw -op Allgather,Alltoall out.txt\n\nCompare times in output files in dir1 with those in dir2:\n\n    crunch_mpiBench -data DIR1_DATA dir1/* -data DIR2_DATA dir2/*\n\n# Additional Notes\n\nRank 0 always acts as the root process for collectives which involve\na root.\n\nIf the minimum and maximum are quite different, then some processes\nmay be escaping ahead to start later iterations before the last one\nhas completely finished.  In this case, one may use the maximum time\nreported or insert a barrier between consecutive invocations (build\nwith \"make\" instead of \"make nobar\") to syncronize the processes.\n\nFor Reduce and Allreduce, vectors of doubles are added, so message\nsizes of 1, 2, and 4-bytes are skipped.\n\nTwo available make commands build mpiBench with test kernels like\nthe following:\n\n       \"make\"              \"make nobar\"\n    start=timer()        start=timer()\n    for(i=o;i\u003cN;i++)     for(i=o;i\u003cN;i++)\n    {                    {\n      MPI_Gather()         MPI_Gather()\n      MPI_Barrier()\n    }                    }\n    end=timer()          end=timer()\n    time=(end-start)/N   time=(end-start)/N\n\n\"make nobar\" may allow processes to escape ahead, but does not\ninclude cost of barrier.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fllnl%2Fmpibench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fllnl%2Fmpibench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fllnl%2Fmpibench/lists"}