{"id":20515960,"url":"https://github.com/zyedidia/perforator","last_synced_at":"2026-03-10T14:34:08.580Z","repository":{"id":49551486,"uuid":"325920614","full_name":"zyedidia/perforator","owner":"zyedidia","description":"Record \"perf\" performance metrics for individual functions/regions of an ELF binary.","archived":false,"fork":false,"pushed_at":"2024-01-17T04:01:51.000Z","size":123,"stargazers_count":81,"open_issues_count":5,"forks_count":5,"subscribers_count":5,"default_branch":"master","last_synced_at":"2026-01-26T13:48:13.415Z","etag":null,"topics":["benchmark","go","linux","perf","performance","profiling","profiling-functions","tracing"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zyedidia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-01T05:08:45.000Z","updated_at":"2025-07-09T10:45:07.000Z","dependencies_parsed_at":"2024-06-19T00:39:21.961Z","dependency_job_id":null,"html_url":"https://github.com/zyedidia/perforator","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/zyedidia/perforator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zyedidia%2Fperforator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zyedidia%2Fperforator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zyedidia%2Fperforator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zyedidia%2Fperforator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zyedidia","download_url":"https://codeload.github.com/zyedidia/perforator/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zyedidia%2Fperforator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30337297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T12:41:07.687Z","status":"ssl_error","status_checked_at":"2026-03-10T12:41:06.728Z","response_time":106,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","go","linux","perf","performance","profiling","profiling-functions","tracing"],"created_at":"2024-11-15T21:25:44.872Z","updated_at":"2026-03-10T14:34:08.562Z","avatar_url":"https://github.com/zyedidia.png","language":"Go","readme":"# Perforator\n\n[![Documentation](https://godoc.org/github.com/zyedidia/perforator?status.svg)](http://godoc.org/github.com/zyedidia/perforator)\n[![Go Report Card](https://goreportcard.com/badge/github.com/zyedidia/perforator)](https://goreportcard.com/report/github.com/zyedidia/perforator)\n[![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/zyedidia/perforator/blob/master/LICENSE)\n\nPerforator is a tool for recording performance metrics over subregions of a\nprogram (e.g., functions) using the Linux \"perf\" interface. The `perf` tool\nprovided by the Linux kernel only supports collecting statistics over the\ncomplete lifetime of a program, which is often inconvenient when a program\nincludes setup and cleanup that should not be profiled along with the\nbenchmark. Perforator is not as comprehensive as `perf` but it allows you to\ncollect statistics for individual functions or address ranges.\n\nPerforator only supports Linux AMD64. The target ELF binary may be generated\nfrom any language. For function lookup, make sure the binary is not stripped\n(it must contain a symbol table), and for additional information (source code\nregions, inlined function lookup), the binary must include DWARF information.\nPerforator supports position-independent binaries.\n\nPerforator is primarily intended to be used as a CLI tool, but includes a\nlibrary for more general user-code tracing called `utrace`, a library for\nreading ELF/DWARF information from executables, and a library for tracing perf\nevents in processes.\n\n# Installation\n\nThere are three ways to install Perforator.\n\n1. Download the prebuilt binary from the [releases](https://github.com/zyedidia/perforator/releases) page. Using [Eget](https://github.com/zyedidia/eget):\n\n```\neget zyedidia/perforator\n```\n\n2. Install from source:\n\n```\ngit clone https://github.com/zyedidia/perforator\ncd perforator\nmake build # or make install to install to $GOBIN\n```\n\n3. Install with `go get` (version info will be missing):\n\n```\ngo get github.com/zyedidia/perforator/cmd/perforator\n```\n\n# Usage\n\nFirst make sure that you have the appropriate permissions to record the events\nyou are interested in (this may require running Perforator with `sudo` or\nmodifying `/proc/sys/kernel/perf_event_paranoid` -- see [this\npost](https://superuser.com/questions/980632/run-perf-without-root-rights)).\nIf Perforator still can't find any events, double check that your system\nsupports the `perf_event_open` system call (try installing the `perf` tool from\nthe Linux kernel).\n\n### Options\n\n```\nUsage:\n  perforator [OPTIONS] COMMAND [ARGS]\n\nApplication Options:\n  -l, --list=         List available events for {hardware, software, cache, trace} event types\n  -e, --events=       Comma-separated list of events to profile\n  -g, --group=        Comma-separated list of events to profile together as a group\n  -r, --region=       Region(s) to profile: 'function' or 'start-end'; start/end locations may be file:line or hex addresses\n      --kernel        Include kernel code in measurements\n      --hypervisor    Include hypervisor code in measurements\n      --exclude-user  Exclude user code from measurements\n  -s, --summary       Instead of printing results immediately, show an aggregated summary afterwards\n      --sort-key=     Key to sort summary tables with\n      --reverse-sort  Reverse summary table sorting\n      --csv           Write summary output in CSV format\n  -o, --output=       Write summary output to file\n  -V, --verbose       Show verbose debug information\n  -v, --version       Show version information\n  -h, --help          Show this help message\n```\n\n### Example\n\nSuppose we had a C function that summed an array and wanted to benchmark it for\nsome large array of numbers. We could write a small benchmark program like so:\n\n```c\n#include \u003cstdio.h\u003e\n#include \u003cstdlib.h\u003e\n#include \u003ctime.h\u003e\n#include \u003cstdint.h\u003e\n\n#define SIZE 10000000\n\nuint64_t sum(uint32_t* numbers) {\n    uint64_t sum = 0;\n    for (int i = 0; i \u003c SIZE; i++) {\n        sum += numbers[i];\n    }\n    return sum;\n}\n\nint main() {\n    srand(time(NULL));\n    uint32_t* numbers = malloc(SIZE * sizeof(uint32_t));\n    for (int i = 0; i \u003c SIZE; i++) {\n        numbers[i] = rand();\n    }\n\n    uint64_t result = sum(numbers);\n    printf(\"%lu\\n\", result);\n    return 0;\n}\n```\n\nIf we want to determine the number of cache misses, branch mispredictions, etc... `perf`\nis not suitable because running `perf stat` on this program will profile the creation of\nthe array in addition to the sum. With Perforator, we can measure just the sum.\n\n### Profiling functions\n\nFirst compile with\n\n```\n$ gcc -g -O2 -o bench bench.c\n```\n\nNow we can measure with Perforator:\n\n```\n$ perforator -r sum ./bench\n+---------------------+-------------+\n| Event               | Count (sum) |\n+---------------------+-------------+\n| instructions        | 50000004    |\n| branch-instructions | 10000002    |\n| branch-misses       | 10          |\n| cache-references    | 1246340     |\n| cache-misses        | 14984       |\n| time-elapsed        | 4.144814ms  |\n+---------------------+-------------+\n10736533065142551\n```\n\nResults are printed immediately when the profiled function returns.\n\nNote: in this case we compiled with `-g` to include DWARF debugging\ninformation.  This was necessary because GCC will inline the call to `sum`, so\nPerforator needs to be able to read the DWARF information to determine where it\nwas inlined to. If you compile without `-g` make sure the target function is\nnot being inlined (either you know it is not inlined, or you mark it with the\n`noinline` attribute).\n\nFun fact: clang does a better job optimizing this code than gcc. I tried\nrunning this example with clang instead and found it only had 1,250,000 branch\ninstructions (roughly 8x fewer than gcc!). The reason: vector instructions.\n\n### Events\n\nBy default, Perforator will measure some basic events such as instructions\nexecuted, cache references, cache misses, branches, branch misses. You can\nspecify events yourself with the `-e` flag:\n\n```\n$ perforator -e l1d-read-accesses,l1d-read-misses -r sum ./bench\n+-------------------+-------------+\n| Event             | Count (sum) |\n+-------------------+-------------+\n| l1d-read-accesses | 10010311    |\n| l1d-read-misses   | 625399      |\n| time-elapsed      | 4.501523ms  |\n+-------------------+-------------+\n10736888439771461\n```\n\nTo view available events, use the `--list` flag:\n\n```\n$ perforator --list hardware # List hardware events\n$ perforator --list software # List software events\n$ perforator --list cache    # List cache events\n$ perforator --list trace    # List kernel trace events\n```\n\nDetailed documentation for each event is available in the manual page for\nPerforator.  See the `perforator.1` manual included with the prebuilt binary.\nThe `man` directory in the source code contains the Markdown source, which can\nbe compiled using Pandoc (via `make perforator.1`). You can also download the\nman page with [eget](https://github.com/zyedidia/eget): `eget -f perforator.1 zyedidia/perforator`.\n\n### Source code regions\n\nIn additional to profiling functions, you may profile regions specified by source\ncode ranges if your binary has DWARF debugging information.\n\n```\n$ perforator -r bench.c:18-bench.c:23 ./bench\n+---------------------+-------------------------------+\n| Event               | Count (bench.c:18-bench.c:23) |\n+---------------------+-------------------------------+\n| instructions        | 668794280                     |\n| branch-instructions | 169061639                     |\n| branch-misses       | 335360                        |\n| cache-references    | 945581                        |\n| cache-misses        | 3569                          |\n| time-elapsed        | 78.433272ms                   |\n+---------------------+-------------------------------+\n10737167007294257\n```\n\nOnly certain line numbers are available for breakpoints. The range is exclusive\non the upper bound, meaning that in the example above `bench.c:23` is not\nincluded in profiling.\n\nYou may also directly specify addresses as decimal or hexadecimal numbers. This\nis useful if you don't have DWARF information but you know the addresses you\nwant to profile (for example, by inspecting the disassembly via `objdump`).\n\n### Multiple regions\n\nYou can also profile multiple regions at once:\n\n```\n$ perforator -r bench.c:18-bench.c:23 -r sum -r main ./bench\n+---------------------+-------------------------------+\n| Event               | Count (bench.c:18-bench.c:23) |\n+---------------------+-------------------------------+\n| instructions        | 697120715                     |\n| branch-instructions | 162949718                     |\n| branch-misses       | 302849                        |\n| cache-references    | 823087                        |\n| cache-misses        | 3645                          |\n| time-elapsed        | 78.832332ms                   |\n+---------------------+-------------------------------+\n+---------------------+-------------+\n| Event               | Count (sum) |\n+---------------------+-------------+\n| instructions        | 49802557    |\n| branch-instructions | 10000002    |\n| branch-misses       | 9           |\n| cache-references    | 1246639     |\n| cache-misses        | 14382       |\n| time-elapsed        | 4.235705ms  |\n+---------------------+-------------+\n10739785644063349\n+---------------------+--------------+\n| Event               | Count (main) |\n+---------------------+--------------+\n| instructions        | 675150939    |\n| branch-instructions | 184259174    |\n| branch-misses       | 386503       |\n| cache-references    | 1128637      |\n| cache-misses        | 8368         |\n| time-elapsed        | 83.132829ms  |\n+---------------------+--------------+\n```\n\nIn this case, it may be useful to use the `--summary` option, which will\naggregate all results into a table that is printed when tracing stops.\n\n```\n$ perforator --summary -r bench.c:19-bench.c:24 -r sum -r main ./bench\n10732787118410148\n+-----------------------+--------------+---------------------+---------------+------------------+--------------+--------------+\n| region                | instructions | branch-instructions | branch-misses | cache-references | cache-misses | time-elapsed |\n+-----------------------+--------------+---------------------+---------------+------------------+--------------+--------------+\n| bench.c:18-bench.c:23 | 718946520    | 172546336           | 326000        | 833098           | 3616         | 81.798381ms  |\n| main                  | 678365328    | 174259806           | 363737        | 1115394          | 4403         | 86.321344ms  |\n| sum                   | 43719896     | 10000002            | 9             | 1248069          | 16931        | 4.453342ms   |\n+-----------------------+--------------+---------------------+---------------+------------------+--------------+--------------+\n```\n\nYou can use the `--sort-key` and `--reverse-sort` options to modify which\ncolumns are sorted and how. In addition, you can use the `--csv` option to\nwrite the output table in CSV form.\n\nNote: to an astute observer, the results from the above table don't look very\naccurate.  In particular the totals for the main function seem questionable.\nThis is due to event multiplexing (explained more below), and for best results\nyou should not profile multiple regions simultaneously. In the table above, you\ncan see that it's likely that profiling for `main` was disabled while `sum` was\nrunning.\n\n### Groups\n\nThe CPU has a fixed number of performance counters. If you try recording more\nevents than there are counters, \"multiplexing\" will be performed to estimate\nthe totals for all the events. For example, if we record 6 events on the sum\nbenchmark, the instruction count becomes less stable. This is because the\nnumber of events now exceeds the number of hardware registers for counting, and\nmultiplexing occurs. To ensure that certain events are always counted together,\nyou can put them all in a group with the `-g` option. The `-g` option has the\nsame syntax as the `-e` option, but may be specified multiple times (for\nmultiple groups).\n\n# Notes and caveats\n\n\n* Tip: enable verbose mode with the `-V` flag when you are not seeing the\n  expected result.\n* Many CPUs expose additional/non-standardized raw perf events. Perforator does\n  not currently support those events.\n* Perforator has only limited support for multithreaded programs. It supports\n  profiling programs with multiple threads as long as it is the case that each\n  profiled region is only run by one thread (ever). In addition, the beginning\n  and end of a region must be run by the same thread. This means if you are\n  benchmarking Go you should call `runtime.LockOSThread` in your benchmark to\n  prevent a goroutine migration while profiling.\n* A region is either active or inactive, it cannot be active multiple times at\n  once. This means for recursive functions only the first invocation of the\n  function is tracked.\n* Be careful of multiplexing, which occurs when you are trying to record more\n  events than there are hardware counter registers. In particular, if you\n  profile a function inside of another function being profiled, this will\n  likely result in multiplexing and possibly incorrect counts. Perforator will\n  automatically attempt to scale counts when multiplexing occurs. To see if\n  this has happened, use the `-V` flag, which will print information when\n  multiplexing is detected.\n* Be careful if your target functions are being inlined. Perforator will\n  automatically attempt to read DWARF information to determine the inline sites\n  for target functions but it's a good idea to double check if you are seeing\n  weird results. Use the `-V` flag to see where Perforator thinks the inline\n  site is.\n\n# How it works\n\nPerforator uses `ptrace` to trace the target program and enable profiling for\ncertain parts of the target program. Perforator places the `0xCC` \"interrupt\"\ninstruction at the beginning of the profiled function which allows it to regain\ncontrol when the function is executed. At that point, Perforator will place the\noriginal code back (whatever was initially overwritten by the interrupt byte),\ndetermine the return address by reading the top of the stack, and place an\ninterrupt byte at that address. Then Perforator will enable profiling and\nresume the target process. When the next interrupt happens, the target will\nhave reached the return address and Perforator can stop profiling, remove the\ninterrupt, and place a new interrupt back at the start of the function.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzyedidia%2Fperforator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzyedidia%2Fperforator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzyedidia%2Fperforator/lists"}