{"id":18711213,"url":"https://github.com/rocm/omnitrace","last_synced_at":"2025-05-16T07:05:38.699Z","repository":{"id":36966031,"uuid":"463024096","full_name":"ROCm/omnitrace","owner":"ROCm","description":"Omnitrace: Application Profiling, Tracing, and Analysis","archived":false,"fork":false,"pushed_at":"2025-04-03T13:36:37.000Z","size":6458,"stargazers_count":311,"open_issues_count":21,"forks_count":28,"subscribers_count":15,"default_branch":"amd-staging","last_synced_at":"2025-04-19T22:54:34.135Z","etag":null,"topics":["binary-instrumentation","code-coverage","cpu-profiler","dynamic-instrumentation","gpu-profiler","hardware-counters","instrumentation-profiler","linux","performance-analysis","performance-metrics","performance-monitoring","profiler","profiling","python","python-profiler","sampling-profiler","tracing"],"latest_commit_sha":null,"homepage":"https://rocm.docs.amd.com/projects/omnitrace/en/docs-6.2.4/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ROCm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-24T05:44:58.000Z","updated_at":"2025-04-14T16:29:01.000Z","dependencies_parsed_at":"2023-02-19T02:15:43.300Z","dependency_job_id":"421e3a39-209d-46de-82cd-ffe863fb43c8","html_url":"https://github.com/ROCm/omnitrace","commit_stats":{"total_commits":335,"total_committers":19,"mean_commits":17.63157894736842,"dds":"0.18208955223880596","last_synced_commit":"927613f7e67e25e718f62d9eaa3969f299483c6a"},"previous_names":["rocm/omnitrace","amdresearch/omnitrace"],"tags_count":40,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ROCm%2Fomnitrace","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ROCm%2Fomnitrace/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ROCm%2Fomnitrace/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ROCm%2Fomnitrace/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ROCm","download_url":"https://codeload.github.com/ROCm/omnitrace/tar.gz/refs/heads/amd-staging","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254485062,"owners_count":22078767,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binary-instrumentation","code-coverage","cpu-profiler","dynamic-instrumentation","gpu-profiler","hardware-counters","instrumentation-profiler","linux","performance-analysis","performance-metrics","performance-monitoring","profiler","profiling","python","python-profiler","sampling-profiler","tracing"],"created_at":"2024-11-07T12:37:52.335Z","updated_at":"2025-05-16T07:05:33.688Z","avatar_url":"https://github.com/ROCm.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Omnitrace: Application Profiling, Tracing, and Analysis\n\n[![Ubuntu 20.04 with GCC, ROCm, and MPI](https://github.com/ROCm/omnitrace/actions/workflows/ubuntu-focal.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/ubuntu-focal.yml)\n[![Ubuntu 22.04 (GCC, Python, ROCm)](https://github.com/ROCm/omnitrace/actions/workflows/ubuntu-jammy.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/ubuntu-jammy.yml)\n[![OpenSUSE 15.x with GCC](https://github.com/ROCm/omnitrace/actions/workflows/opensuse.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/opensuse.yml)\n[![RedHat Linux (GCC, Python, ROCm)](https://github.com/ROCm/omnitrace/actions/workflows/redhat.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/redhat.yml)\n[![Installer Packaging (CPack)](https://github.com/ROCm/omnitrace/actions/workflows/cpack.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/cpack.yml)\n[![Documentation](https://github.com/ROCm/omnitrace/actions/workflows/docs.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/docs.yml)\n\n\u003e [!NOTE]\n\u003e Omnitrace is being rebranded to ROCm Systems Profiler and its new home is \u003chttps://github.com/ROCm/rocprofiler-systems\u003e.\nAll future development will occur in the new repository; this includes upgrading the tool to use [rocprofiler-sdk](https://github.com/ROCm/rocprofiler-sdk).\nThis repository will remain open for some time and can be used with versions of ROCm before the introduction of rocprofiler-sdk (that is, before ROCm version 6.2).\n\n## Overview\n\nAMD Research is seeking to improve observability and performance analysis for software running on AMD heterogeneous systems.\nIf you are familiar with [rocprof](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/how-to/using-rocprof.html) and/or [uProf](https://developer.amd.com/amd-uprof/),\nyou will find many of the capabilities of these tools available via Omnitrace in addition to many new capabilities.\n\nOmnitrace is a comprehensive profiling and tracing tool for parallel applications written in C, C++, Fortran, HIP, OpenCL, and Python which execute on the CPU or CPU+GPU.\nIt is capable of gathering the performance information of functions through any combination of binary instrumentation, call-stack sampling, user-defined regions, and Python interpreter hooks.\nOmnitrace supports interactive visualization of comprehensive traces in the web browser in addition to high-level summary profiles with mean/min/max/stddev statistics.\nIn addition to runtimes, omnitrace supports the collection of system-level metrics such as the CPU frequency, GPU temperature, and GPU utilization, process-level metrics\nsuch as the memory usage, page-faults, and context-switches, and thread-level metrics such as memory usage, CPU time, and numerous hardware counters.\n\n\u003e [!NOTE]\n\u003e Full documentation is available at [Omnitrace documentation](https://rocm.docs.amd.com/projects/omnitrace/en/latest/index.html) in an organized, easy-to-read, searchable format.\nThe documentation source files reside in the [`/docs`](/docs) folder of this repository. For information on contributing to the documentation, see\n[Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html)\n\n### Data Collection Modes\n\n- Dynamic instrumentation\n  - Runtime instrumentation\n    - Instrument executable and shared libraries at runtime\n  - Binary rewriting\n    - Generate a new executable and/or library with instrumentation built-in\n- Statistical sampling\n  - Periodic software interrupts per-thread\n- Process-level sampling\n  - Background thread records process-, system- and device-level metrics while the application executes\n- Causal profiling\n  - Quantifies the potential impact of optimizations in parallel codes\n\n### Data Analysis\n\n- High-level summary profiles with mean/min/max/stddev statistics\n  - Low overhead, memory efficient\n  - Ideal for running at scale\n- Comprehensive traces\n  - Every individual event/measurement\n- Application speedup predictions resulting from potential optimizations in functions and lines of code (causal profiling)\n\n### Parallelism API Support\n\n- HIP\n- HSA\n- Pthreads\n- MPI\n- Kokkos-Tools (KokkosP)\n- OpenMP-Tools (OMPT)\n\n### GPU Metrics\n\n- GPU hardware counters\n- HIP API tracing\n- HIP kernel tracing\n- HSA API tracing\n- HSA operation tracing\n- System-level sampling (via rocm-smi)\n  - Memory usage\n  - Power usage\n  - Temperature\n  - Utilization\n\n### CPU Metrics\n\n- CPU hardware counters sampling and profiles\n- CPU frequency sampling\n- Various timing metrics\n  - Wall time\n  - CPU time (process and/or thread)\n  - CPU utilization (process and/or thread)\n  - User CPU time\n  - Kernel CPU time\n- Various memory metrics\n  - High-water mark (sampling and profiles)\n  - Memory page allocation\n  - Virtual memory usage\n- Network statistics\n- I/O metrics\n- ... many more\n\n## Quick Start\n\n### Installation\n\n- Visit [Releases](https://github.com/ROCm/omnitrace/releases) page\n- Select appropriate installer (recommendation: `.sh` scripts do not require super-user priviledges unlike the DEB/RPM installers)\n  - If targeting a ROCm application, find the installer script with the matching ROCm version\n  - If you are unsure about your Linux distro, check `/etc/os-release` or use the `omnitrace-install.py` script\n\nIf the above recommendation is not desired, download the `omnitrace-install.py` and specify `--prefix \u003cinstall-directory\u003e` when\nexecuting it. This script will attempt to auto-detect a compatible OS distribution and version.\nIf ROCm support is desired, specify `--rocm X.Y` where `X` is the ROCm major version and `Y`\nis the ROCm minor version, e.g. `--rocm 5.4`.\n\n```console\nwget https://github.com/ROCm/omnitrace/releases/latest/download/omnitrace-install.py\npython3 ./omnitrace-install.py --prefix /opt/omnitrace/rocm-5.4 --rocm 5.4\n```\n\nSee the [Omnitrace installation guide](https://rocm.docs.amd.com/projects/omnitrace/en/latest/install/install.html) for detailed information.\n\n### Setup\n\n\u003e NOTE: Replace `/opt/omnitrace` below with installation prefix as necessary.\n\n- Option 1: Source `setup-env.sh` script\n\n```bash\nsource /opt/omnitrace/share/omnitrace/setup-env.sh\n```\n\n- Option 2: Load modulefile\n\n```bash\nmodule use /opt/omnitrace/share/modulefiles\nmodule load omnitrace\n```\n\n- Option 3: Manual\n\n```bash\nexport PATH=/opt/omnitrace/bin:${PATH}\nexport LD_LIBRARY_PATH=/opt/omnitrace/lib:${LD_LIBRARY_PATH}\n```\n\n### Omnitrace Settings\n\nGenerate an omnitrace configuration file using `omnitrace-avail -G omnitrace.cfg`. Optionally, use `omnitrace-avail -G omnitrace.cfg --all` for\na verbose configuration file with descriptions, categories, etc. Modify the configuration file as desired, e.g. enable\n[perfetto](https://perfetto.dev/), [timemory](https://github.com/NERSC/timemory), sampling, and process-level sampling by default\nand tweak some sampling default values:\n\n```console\n# ...\nOMNITRACE_TRACE                = true\nOMNITRACE_PROFILE              = true\nOMNITRACE_USE_SAMPLING         = true\nOMNITRACE_USE_PROCESS_SAMPLING = true\n# ...\nOMNITRACE_SAMPLING_FREQ        = 50\nOMNITRACE_SAMPLING_CPUS        = all\nOMNITRACE_SAMPLING_GPUS        = $env:HIP_VISIBLE_DEVICES\n```\n\nOnce the configuration file is adjusted to your preferences, either export the path to this file via `OMNITRACE_CONFIG_FILE=/path/to/omnitrace.cfg`\nor place this file in `${HOME}/.omnitrace.cfg` to ensure these values are always read as the default. If you wish to change any of these settings,\nyou can override them via environment variables or by specifying an alternative `OMNITRACE_CONFIG_FILE`.\n\n### Call-Stack Sampling\n\nThe `omnitrace-sample` executable is used to execute call-stack sampling on a target application without binary instrumentation.\nUse a double-hypen (`--`) to separate the command-line arguments for `omnitrace-sample` from the target application and it's arguments.\n\n```shell\nomnitrace-sample --help\nomnitrace-sample \u003comnitrace-options\u003e -- \u003cexe\u003e \u003cexe-options\u003e\nomnitrace-sample -f 1000 -- ls -la\n```\n\n### Binary Instrumentation\n\nThe `omnitrace` executable is used to instrument an existing binary. Call-stack sampling can be enabled alongside\nthe execution an instrumented binary, to help \"fill in the gaps\" between the instrumentation via setting the `OMNITRACE_USE_SAMPLING`\nconfiguration variable to `ON`.\nSimilar to `omnitrace-sample`, use a double-hypen (`--`) to separate the command-line arguments for `omnitrace` from the target application and it's arguments.\n\n```shell\nomnitrace-instrument --help\nomnitrace-instrument \u003comnitrace-options\u003e -- \u003cexe-or-library\u003e \u003cexe-options\u003e\n```\n\n#### Binary Rewrite\n\nRewrite the text section of an executable or library with instrumentation:\n\n```shell\nomnitrace-instrument -o app.inst -- /path/to/app\n```\n\nIn binary rewrite mode, if you also want instrumentation in the linked libraries, you must also rewrite those libraries.\nExample of rewriting the functions starting with `\"hip\"` with instrumentation in the amdhip64 library:\n\n```shell\nmkdir -p ./lib\nomnitrace-instrument -R '^hip' -o ./lib/libamdhip64.so.4 -- /opt/rocm/lib/libamdhip64.so.4\nexport LD_LIBRARY_PATH=${PWD}/lib:${LD_LIBRARY_PATH}\n```\n\n\u003e ***Verify via `ldd` that your executable will load the instrumented library -- if you built your executable with***\n\u003e ***an RPATH to the original library's directory, then prefixing `LD_LIBRARY_PATH` will have no effect.***\n\nOnce you have rewritten your executable and/or libraries with instrumentation, you can just run the (instrumented) executable\nor exectuable which loads the instrumented libraries normally, e.g.:\n\n```shell\nomnitrace-run -- ./app.inst\n```\n\nIf you want to re-define certain settings to new default in a binary rewrite, use the `--env` option. This `omnitrace` option\nwill set the environment variable to the given value but will not override it. E.g. the default value of `OMNITRACE_PERFETTO_BUFFER_SIZE_KB`\nis 1024000 KB (1 GiB):\n\n```shell\n# buffer size defaults to 1024000\nomnitrace-instrument -o app.inst -- /path/to/app\nomnitrace-run -- ./app.inst\n```\n\nPassing `--env OMNITRACE_PERFETTO_BUFFER_SIZE_KB=5120000` will change the default value in `app.inst` to 5120000 KiB (5 GiB):\n\n```shell\n# defaults to 5 GiB buffer size\nomnitrace-instrument -o app.inst --env OMNITRACE_PERFETTO_BUFFER_SIZE_KB=5120000 -- /path/to/app\nomnitrace-run -- ./app.inst\n```\n\n```shell\n# override default 5 GiB buffer size to 200 MB via command-line\nomnitrace-run --trace-buffer-size=200000 -- ./app.inst\n# override default 5 GiB buffer size to 200 MB via environment\nexport OMNITRACE_PERFETTO_BUFFER_SIZE_KB=200000\nomnitrace-run -- ./app.inst\n```\n\n#### Runtime Instrumentation\n\nRuntime instrumentation will not only instrument the text section of the executable but also the text sections of the\nlinked libraries. Thus, it may be useful to exclude those libraries via the `-ME` (module exclude) regex option\nor exclude specific functions with the `-E` regex option.\n\n```shell\nomnitrace-instrument -- /path/to/app\nomnitrace-instrument -ME '^(libhsa-runtime64|libz\\\\.so)' -- /path/to/app\nomnitrace-instrument -E 'rocr::atomic|rocr::core|rocr::HSA' --  /path/to/app\n```\n\n### Python Profiling and Tracing\n\nUse the `omnitrace-python` script to profile/trace Python interpreter function calls.\nUse a double-hypen (`--`) to separate the command-line arguments for `omnitrace-python` from the target script and it's arguments.\n\n```shell\nomnitrace-python --help\nomnitrace-python \u003comnitrace-options\u003e -- \u003cpython-script\u003e \u003cscript-args\u003e\nomnitrace-python -- ./script.py\n```\n\nPlease note, the first argument after the double-hyphen *must be a Python script*, e.g. `omnitrace-python -- ./script.py`.\n\nIf you need to specify a specific python interpreter version, use `omnitrace-python-X.Y` where `X.Y` is the Python\nmajor and minor version:\n\n```shell\nomnitrace-python-3.8 -- ./script.py\n```\n\nIf you need to specify the full path to a Python interpreter, set the `PYTHON_EXECUTABLE` environment variable:\n\n```shell\nPYTHON_EXECUTABLE=/opt/conda/bin/python omnitrace-python -- ./script.py\n```\n\nIf you want to restrict the data collection to specific function(s) and its callees, pass the `-b` / `--builtin` option after decorating the\nfunction(s) with `@profile`. Use the `@noprofile` decorator for excluding/ignoring function(s) and its callees:\n\n```python\ndef foo():\n    pass\n\n@noprofile\ndef bar():\n    foo()\n\n@profile\ndef spam():\n    foo()\n    bar()\n```\n\nEach time `spam` is called during profiling, the profiling results will include 1 entry for `spam` and 1 entry\nfor `foo` via the direct call within `spam`. There will be no entries for `bar` or the `foo` invocation within it.\n\n### Trace Visualization\n\n- Visit [ui.perfetto.dev](https://ui.perfetto.dev) in the web-browser\n- Select \"Open trace file\" from panel on the left\n- Locate the omnitrace perfetto output (extension: `.proto`)\n\n![omnitrace-perfetto](docs/data/omnitrace-perfetto.png)\n\n![omnitrace-rocm](docs/data/omnitrace-rocm.png)\n\n![omnitrace-rocm-flow](docs/data/omnitrace-rocm-flow.png)\n\n![omnitrace-user-api](docs/data/omnitrace-user-api.png)\n\n## Using Perfetto tracing with System Backend\n\nPerfetto tracing with the system backend supports multiple processes writing to the same\noutput file. Thus, it is a useful technique if Omnitrace is built with partial MPI support\nbecause all the perfetto output will be coalesced into a single file. The\ninstallation docs for perfetto can be found [here](https://perfetto.dev/docs/contributing/build-instructions).\nIf you are building omnitrace from source, you can configure CMake with `OMNITRACE_INSTALL_PERFETTO_TOOLS=ON`\nand the `perfetto` and `traced` applications will be installed as part of the build process. However,\nit should be noted that to prevent this option from accidentally overwriting an existing perfetto install,\nall the perfetto executables installed by omnitrace are prefixed with `omnitrace-perfetto-`, except for the `perfetto`\nexecutable, which is just renamed `omnitrace-perfetto`.\n\nEnable `traced` and `perfetto` in the background:\n\n```shell\npkill traced\ntraced --background\nperfetto --out ./omnitrace-perfetto.proto --txt -c ${OMNITRACE_ROOT}/share/perfetto.cfg --background\n```\n\n\u003e ***NOTE: if the perfetto tools were installed by omnitrace, replace `traced` with `omnitrace-perfetto-traced` and***\n\u003e ***`perfetto` with `omnitrace-perfetto`.***\n\nConfigure omnitrace to use the perfetto system backend via the `--perfetto-backend` option of `omnitrace-run`:\n\n```shell\n# enable sampling on the uninstrumented binary\nomnitrace-run --sample --trace --perfetto-backend=system -- ./myapp\n# trace the instrument the binary\nomnitrace-instrument -o ./myapp.inst -- ./myapp\nomnitrace-run --trace --perfetto-backend=system -- ./myapp.inst\n```\n\nor via the `--env` option of `omnitrace-instrument` + runtime instrumentation:\n\n```shell\nomnitrace-instrument --env OMNITRACE_PERFETTO_BACKEND=system -- ./myapp\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frocm%2Fomnitrace","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frocm%2Fomnitrace","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frocm%2Fomnitrace/lists"}