{"id":18421972,"url":"https://github.com/spcl/llamp","last_synced_at":"2025-04-13T12:11:28.901Z","repository":{"id":236601525,"uuid":"785132033","full_name":"spcl/llamp","owner":"spcl","description":"Project repository for the SC24 paper LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming","archived":false,"fork":false,"pushed_at":"2024-08-07T04:50:26.000Z","size":13486,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-02-10T00:57:55.864Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spcl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-11T09:03:24.000Z","updated_at":"2024-11-29T18:26:40.000Z","dependencies_parsed_at":"2024-04-28T02:00:24.178Z","dependency_job_id":"4ade54e4-080e-45f3-9adb-93b4b23de407","html_url":"https://github.com/spcl/llamp","commit_stats":null,"previous_names":["spcl/llamp"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fllamp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fllamp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fllamp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fllamp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spcl","download_url":"https://codeload.github.com/spcl/llamp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248710445,"owners_count":21149190,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T04:27:34.369Z","updated_at":"2025-04-13T12:11:28.871Z","avatar_url":"https://github.com/spcl.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming\n\n\nLLAMP (**L**ogGPS and **L**inear Programming based **A**nalyzer for **M**PI **P**rograms) is a toolchain designed for efficient analysis and quantification of network latency sensitivity and tolerance in HPC applications. By leveraging the LogGOPSim framework, LLAMP records MPI traces of MPI programs and transforms them into execution graphs. These graphs, through the use of the LogGPS model, are then converted into linear programs. They can be solved rapidly by modern linear solvers, allowing us to efficiently gather valuable metrics, such as the predicted runtime of programs, and critical path metrics.\n\n## Quick start\n### Dependencies\n- Python3\n- [Gurobi](https://www.gurobi.com/downloads/) linear solver\n- [igraph](https://igraph.org/)\n\nTo run the latency injection experiment, you need to download the following source code and follow the instructions in `validation/latency-injector/README.md`.\n- [MPICH](https://www.mpich.org/downloads/) (4.1.2)\n- [UCX](https://github.com/openucx/ucx) (1.16.x)\n\n\n### Generating Linear Programs from MPI Traces\n\nFirst, compile `LogGOPSim`, `liballprof`, and `Schedgen` in their corresponding directories.\n- To compile LogGOPSim, make sure you have `graphviz` installed. Then, call `make` to build the executable.\n- To compile Schedgen, simply call `make` to build the executable. If you want to change the p2p algorithm used for a specific collective operation, you have to modify the file `process_trace.cpp` and recompile schedgen.\n- To compile liballprof, change the compilers in `setup.sh` according to your system, and type `bash setup.sh`. The library can then be found in `liballprof/.libs`.\n\nAfter the LogGOPSim toolchain has been built, use the `lp_gen.py` script in the `scripts` directory to generate linear programs for MPI applications directly. Descriptions of the parameters can be obtained by `python3 lp_gen.py -h`.\n\nFor example, to create a linear program for LULESH, execute the following command if you are using Open MPI:\n```console\n\u003e python3 lp_gen.py -c \"mpirun -x LD_PRELOAD -np 8 \u003cpath-to-lulesh\u003e -i 100 -s 8\" -p lulesh_test -v\n```\nIf you are using MPICH:\n```console\n\u003e python3 lp_gen.py -c \"mpirun -envall -np 8 \u003cpath-to-lulesh\u003e -i 100 -s 8\" -p lulesh_test -v\n```\nIf you are running your application in a cluster that uses slurm:\n```console\n\u003e python3 lp_gen.py -c \"srun --export=ALL -N2 -n8 \u003cpath-to-lulesh\u003e -i 100 -s 8\" -p lulesh_test -v\n```\nThis will create a directory named `lulesh_test` under the root project directory, which will contain the traces inside the `traces` folder. The generated linear programming model will be saved as both `.lp` file and `.mps` file.\n\nIf the traces have already been collected, and you want to try out different parameters, such as `o` or `G`, add the `-s` argument when running the script to skip tracing.\n\nTo generate linear programs for [ICON](https://icon-model.org/), a few changes need to be made. To start, follow the instructions in `case-studies/icon/README.md` to compile ICON. Then, build `liballprof2`. The script for running ICON can be found in `case-studies/icon/`, make sure to set the paths as well as the `START` command correctly in the script. Type the following command to trace and produce linear programs for ICON:\n```console\n\u003e python3 lp_gen.py -c \"bash ../case-studies/icon/run-icon.sh\" --icon -v -p icon_test\n```\n\n### Linear Program Analysis\n\nTo perform analysis on linear programs, use the `main.py` script inside `mpi-dep-graph`.\n\n#### Network Latency Sensitivity\n\nRun the following command inside `mpi-dep-graph` to generate the network latency sensitivity curves for your application. The results will be stored as CSV files.\n```console\n\u003e python3 main.py --load-lp-model-path ../lulesh_test/lulesh_test.lp -a sensitivity --output-dir ../lulesh_test/\n```\nThe interval of interest for $L$ can be specified via the `--l-min` and `--l-max` arguments. Change the `--step` argument to set the resolution.\n\n#### Network Latency Tolerance\nRun the following command inside `mpi-dep-graph` to generate the 1% network latency sensitivity tolerance for your application:\n```console\n\u003e python3 main.py --load-lp-model-path ../lulesh_test/lulesh_test.lp -a buffer\n```\nTo change the performance degradation threshold, use the `--lat-buffer-thresh` argument. To specify the baseline of application runtime manually, use the `--lat-buf-baseline` argument.\n\n\n### Misc\n\n- If you only intend to generate performance forecast for your application, you can replace the `-a` argument with `--solve` when executing `main.py`.\n- If you want to save the MPI execution graph and have access to it, use `--export-graph-path` to specify the path for the graph. The graph will be saved as a `pkl` file.\n- We currently do not provide an interface to change the network topology easily, you will have to modify the `topology` variable in `main.py` manually to adjust the configurations for Fat Tree and Dragonfly topologies.\n\n\n### Contributions\nLLAMP, with its linear programming approach, opens up a world of analysis possibilities that are just waiting to be explored. It is really in the hands of the users to discover new metrics or come up with interesting uses for LLAMP. Inside the source code, you will find numerous experimental features, like tools for __MPI process placement__.  If you have any ideas or an improvement, do not hesitate to dive in and submit a pull request! We are looking forward to seeing the new applications brought forward by the community.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspcl%2Fllamp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspcl%2Fllamp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspcl%2Fllamp/lists"}