{"id":18422007,"url":"https://github.com/spcl/perf-taint","last_synced_at":"2025-08-01T22:43:05.297Z","repository":{"id":48300518,"uuid":"240282500","full_name":"spcl/perf-taint","owner":"spcl","description":"Taint-based program analysis framework for empirical performance modeling.","archived":false,"fork":false,"pushed_at":"2022-09-28T14:38:54.000Z","size":39280,"stargazers_count":5,"open_issues_count":20,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-07T14:39:50.013Z","etag":null,"topics":["clang","compiler","hpc","llvm","performance-analysis","performance-modeling"],"latest_commit_sha":null,"homepage":"","language":"LLVM","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spcl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-13T14:40:11.000Z","updated_at":"2024-06-10T10:28:28.000Z","dependencies_parsed_at":"2022-09-16T08:22:40.818Z","dependency_job_id":null,"html_url":"https://github.com/spcl/perf-taint","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/spcl/perf-taint","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fperf-taint","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fperf-taint/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fperf-taint/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fperf-taint/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spcl","download_url":"https://codeload.github.com/spcl/perf-taint/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fperf-taint/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260173650,"owners_count":22969866,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clang","compiler","hpc","llvm","performance-analysis","performance-modeling"],"created_at":"2024-11-06T04:27:41.178Z","updated_at":"2025-06-16T14:07:39.439Z","avatar_url":"https://github.com/spcl.png","language":"LLVM","funding_links":[],"categories":[],"sub_categories":[],"readme":"# perf-taint\n\n**LLVM-based taint analysis framework for HPC performance modeling.**\n\n[![CircleCI](https://circleci.com/gh/spcl/perf-taint.svg?style=shield)](https://circleci.com/gh/spcl/perf-taint)\n![Release](https://img.shields.io/github/v/release/spcl/perf-taint)\n![Docker Image](https://img.shields.io/docker/v/spcleth/perf-taint/latest?label=Docker)\n![GitHub issues](https://img.shields.io/github/issues/spcl/perf-taint)\n![GitHub pull requests](https://img.shields.io/github/issues-pr/spcl/perf-taint)\n\nPerf-taint implements taint-based analysis of the program's performance to find the performance-relevant\nparameters and discover functions that impact the program's performance. Perf-taint generates\na structured JSON file describing relevant functions. We use that to enhance Extra-P empirical\nperformance modeling tool with our new performance analysis and construct\nhybrid, white-box performance models.\n\nThe tool consists of two parts: an LLVM compiler pass and a runtime library. The compiler pass\nperforms static analysis to determine which functions are definitely not performance-relevant\nand instruments the code with taint propagation. The resulting application is linked with our\nruntime library that aggregates tainted data and constructs [a JSON performance profile](docs/json.md).\nThe profile [is passed to Extra-P](docs/extrap.md) to use the program information in the modeling process.\nOur tool supports [parallel MPI programs](docs/mpi.md), and [OpenMP support](docs/openmp.md) is planned for the next release.\nThe documentation describes in detail [the design and implementation of our\ntool](docs/design.md) and provides [a step-by-step explanation](docs/example.md) of our compilation and modeling pipeline.\n\nperf-taint can be used with our Docker image `spcleth/perf-taint:latest`, or the tool\ncan be [installed locally](#installation).\n\nWhen using perf-taint, please cite [our PPoPP'21 paper](https://doi.org/10.1145/3437801.3441613).\nA preprint of our paper is [available on arXiv](https://arxiv.org/abs/2012.15592), and you can\nfind more details about research work [in this paper summary](https://mcopik.github.io/projects/perf_taint/).\n\n```\n@inproceedings{10.1145/3437801.3441613,\n  author = {Copik, Marcin and Calotoiu, Alexandru and Grosser, Tobias and Wicki, Nicolas and Wolf, Felix and Hoefler, Torsten},\n  title = {Extracting Clean Performance Models from Tainted Programs},\n  year = {2021},\n  isbn = {9781450382946},\n  publisher = {Association for Computing Machinery},\n  address = {New York, NY, USA},\n  url = {https://doi.org/10.1145/3437801.3441613},\n  doi = {10.1145/3437801.3441613},\n  booktitle = {Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},\n  pages = {403–417},\n  numpages = {15},\n  keywords = {taint analysis, high-performance computing, LLVM, performance modeling, compiler techniques},\n  location = {Virtual Event, Republic of Korea},\n  series = {PPoPP '21}\n}\n```\n\n## Requirements\n\n* LLVM 9.0 or higher.\n* Alternatively, use our [LLVM fork](https://github.com/nwicki/llvm-project/) to enable control-flow tainting.\n* libc++ 9.0 or higher, built with dfsan tainting - [see instructions](https://mcopik.github.io/blog/2020/dataflow/).\n\nWe provide a Docker image `spcleth/perf-taint:base-dfsan-9.0` (data-flow tainting) and `spcleth/perf-taint:base-cfsan-9.0` (control-flow and data-flow taintint) with `LLVM` and `libcxx` installed. In addition to LLVM and Clang, the images contain additional build tools such as `CMake` and `ninja`.\n\n## Installation\n\nTo build, pass clang as the default compiler, and provide paths to installation of LLVM\nand tainted installation of `libc++`.\n\n```\ncmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_DIR=${PATH_TO_LLVM} -DLIBCXX_PATH=${PATH_TO_LIBCXX}  /path/to/perf-taint\n```\n\n### Building in Docker environment\n\nTo avoid the long and complex process of setting up `LLVM` and `libcxx`, you can build the tool within the Docker environment:\n\n```shell\ndocker run -it -v $(pwd)/perf-taint/:/code-v $(pwd)/build_perf_taint/:/build spcleth/perf-taint:base-cfsan-9.0 /bin/bash -c \"cd /build \u0026\u0026 cmake -G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_DIR=/opt/llvm/ -DLIBCXX_PATH=/opt/llvm /code \u0026\u0026 cmake --build /build -- -j4\"\n```\n\nThe created build directory can be mounted in the Docker container again to process source code for instrumentation.\n\n### CMake Options\n\nThe following options are supported when building the toolchain:\n\n\n| Arguments         |                                                                         |\n|-------------------|-------------------------------------------------------------------------|\n| **Mandatory**     |                                                                         |\n| `LLVM_DIR`        | Path to the LLVM installation (no search in default paths is conducted). |\n| `LIBCXX_PATH`     | Path to the `libc++` installation built with LLVM's DataFlowSanitizer.  |\n| **Optional**      |                                                                         |\n| `LLVM_WITH_CFSAN` | Use the control-flow tainting provided by LLVM fork (**default OFF**)   |\n| `WITH_MPI`        | Build runtime with support for MPI programs (**default ON**)            |\n| `JSONCPP_PATH`    | Path to a installation of jsoncpp library. When not provided, the library is downloaded and configured during build. |\n| `WITH_UNIT_TESTS` | Enable unit tests (**default ON**)                                      |\n| `WITH_REGRESSION_TESTS` | Enable regression tests (**default ON**)                          |\n| `OMP_PATH`        | Path to a tainted installation of OpenMP library and enables support for OpenMP programs (**experimental**) |\n\nVerify the build by running `llvm-lit tests/unit` in build directory.\n\n## Usage\n\nOur pipeline requires a minor code modification to allow registration and tainting program parameters.\nFor each program variable which should be treated as a potentially performance-relevant parameter,\nusers should add the `EXTRAP` annotation and a call to the `register_variable` function.\n\n```\nint size EXTRAP = atoi(argv[1]);\nregister_variable(\u0026size, \"size\");\n```\n\nThen, the source code should be compiled into the LLVM IR bitcode.\nWe provide wrappers `/build-dir/bin/clang` and `/build-dir/bin/clang++` that are configured\nto use the selected LLVM installation. They behave like regular C/C++ compiler, except\nthat our wrappers generate bitcodes from the compilation of translation units. It works\nwell when applied to C/C++ projects implemented with Makefiles or CMake. The IR generation\nhappens while compiling to object code, so the build process is not interrupted.\n\nThe helper script `bin/perf-taint` provides an integrated tool that accepts\nIR files runs our instrumentation together with dfsan, and builds an executable.\nIn addition, the tool includes a handy wrapper that fills all necessary passes and\nimplements the entire compilation pipeline:\n\n```\n/build-dir/bin/perf-taint -t ${output_name} ${input_llvm_ir}\n```\n\nThe documentation provides [a step-by-step explanation](docs/example.md) of our\ncompilation and modeling pipeline, and [covers two HPC benchmarks](docs/benchmarks.md): LULESH\nand MILC's su3_rmd.\n\n## Limitations\n\nWhile `perf-taint` supports a wide set of C++, HPC, and MPI applications, it does have few limitations:\n* OpenMP support is experimental and might not work as expected.\n* Multithreaded applications are not supported at the moment. MPI applications with a single thread per process are fine.\n* Recursive functions are not supported and they're not detected as a part of computational complexity (#16).\n* Taint labels can be propagated in MPI messages, but this not supported at the moment - so far we have not found this limitation to be problematic.\n* When discovering the taint dependency in MPI calls, we support only trivial MPI datatypes. Derived datatypes are not supported.\n* When linking LLVM bitcode with `llvm-link`, copies of the same function, e.g., static functions present in header files, might not be resolved.\nThus, the same function might be seen by our instrumentation as \"f\", \"f.2\", \"f.3\", etc.\nTo merge such functions, use the pass option: `-perf-taint-remove-duplicates`.\nThe implementation uses LLVM's `MergeFunctions` pass which might have the side effect of merging\ndifferent functions presenting the same behavior. To avoid this problem, we offer the experimental\nand custom duplicate removal enabled with `-perf-taint-remove-duplicates-experimental`.\n**WARNING**: this\noption is experimental! It checks that the functions share the same name suffix and they're located\nin the same debug location. However, using different preprocessing definitions might generate\ndifferent codes for the same function in the same location - we don't verify that at the moment.\n\n## Docker\n\n## Testing\n\nWe implement tests as C++ programs with compilation instructions inserted in the header.\nThe tests are executed with the help of `llvm-lit`, and their execution can be easily parallelized with `-j$PROC`.\nTests are split into `regression` tests, which might use multiple cores and few minutes to execute,\nand simple `unit` tests that are sequential and small.\n\nFor details on the compilation instructions, please inspect the definitions in [our lit\nconfiguration file](tests/lit.cfg.in).\n\n## Authors\n\n* [Marcin Copik, ETH Zurich](https://github.com/mcopik/) - main developer.\n* [Nicolas Wicki, ETH Zurich](https://github.com/nwicki/) - contributed the control-flow tainting in LLVM and perf-taint, in addition to various bug fixes.\n* [Alexandru Calotoiu, ETH Zurich and TU Darmstadt](https://github.com/acalotoiu) - worked on the Extra-P integration.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspcl%2Fperf-taint","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspcl%2Fperf-taint","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspcl%2Fperf-taint/lists"}