{"id":13478371,"url":"https://github.com/flamegraph-rs/flamegraph","last_synced_at":"2025-05-13T15:02:35.304Z","repository":{"id":37749962,"uuid":"174381831","full_name":"flamegraph-rs/flamegraph","owner":"flamegraph-rs","description":"Easy flamegraphs for Rust projects and everything else, without Perl or pipes \u003c3","archived":false,"fork":false,"pushed_at":"2025-05-03T07:36:32.000Z","size":1089,"stargazers_count":5174,"open_issues_count":63,"forks_count":161,"subscribers_count":28,"default_branch":"main","last_synced_at":"2025-05-05T22:17:03.286Z","etag":null,"topics":["flamegraphs","perf","profiling"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/flamegraph-rs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-03-07T16:31:30.000Z","updated_at":"2025-05-05T17:07:00.000Z","dependencies_parsed_at":"2022-07-14T23:00:36.262Z","dependency_job_id":"d0bb9194-2887-4357-a485-6e523a6d6718","html_url":"https://github.com/flamegraph-rs/flamegraph","commit_stats":{"total_commits":244,"total_committers":65,"mean_commits":3.753846153846154,"dds":0.7868852459016393,"last_synced_commit":"8792e0927c43a53c80584948622ce7be259e60e6"},"previous_names":["ferrous-systems/cargo-flamegraph"],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flamegraph-rs%2Fflamegraph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flamegraph-rs%2Fflamegraph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flamegraph-rs%2Fflamegraph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flamegraph-rs%2Fflamegraph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/flamegraph-rs","download_url":"https://codeload.github.com/flamegraph-rs/flamegraph/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253601000,"owners_count":21934246,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["flamegraphs","perf","profiling"],"created_at":"2024-07-31T16:01:56.126Z","updated_at":"2025-05-13T15:02:35.254Z","avatar_url":"https://github.com/flamegraph-rs.png","language":"Rust","readme":"# [cargo-]flamegraph\n\n[![colorized flamegraph output](example_cropped.png)](example.svg)\n\nA Rust-powered flamegraph generator with additional support for\nCargo projects! It can be used to profile anything,\nnot just Rust projects! No perl or pipes required \u003c3\n\nBuilt on top of [@jonhoo's](https://github.com/jonhoo) wonderful [Inferno](https://github.com/jonhoo/inferno) all-rust flamegraph generation library!\n\n\u003e [!TIP]\n\u003e You might want to also try [samply](https://github.com/mstange/samply), which provides a more interactive UI\n\u003e using a seamless integration with Firefox's Profiler web UI. It is also written in Rust and has better macOS support.\n\n## Quick Start\n\nInstall it, and run\n\n```bash\n# Rust projects\ncargo flamegraph\n\n# Arbitrary binaries\nflamegraph -- /path/to/binary\n```\n\nHow to use flamegraphs: [what's a flamegraph, and how can I use it to guide systems performance work?](#systems-performance-work-guided-by-flamegraphs)\n\n## Installation\n\n\\[cargo-\\]flamegraph supports\n\n- [Linux](#linux): relies on `perf`\n- [MacOS](#macos): relies on `xctrace`\n- [Windows](#windows): native support with the [blondie](https://github.com/nico-abram/blondie) library; also works with `dtrace` on Windows\n\n`cargo install flamegraph` will make the `flamegraph` and `cargo-flamegraph` binaries available in\nyour cargo binary directory. On most systems this is something like `~/.cargo/bin`.\n\n## Linux\n\n**Note**: If you're using lld or mold on Linux, you must use the `--no-rosegment` flag. Otherwise perf will not be able to generate accurate stack traces ([explanation](https://crbug.com/919499#c16)). For example, for lld:\n\n```toml\n[target.x86_64-unknown-linux-gnu]\nlinker = \"/usr/bin/clang\"\nrustflags = [\"-Clink-arg=-fuse-ld=lld\", \"-Clink-arg=-Wl,--no-rosegment\"]\n```\n\nand for mold:\n\n```toml\n[target.x86_64-unknown-linux-gnu]\nlinker = \"clang\"\nrustflags = [\"-Clink-arg=-fuse-ld=/usr/local/bin/mold\", \"-Clink-arg=-Wl,--no-rosegment\"]\n```\n\n#### Debian (x86 and aarch)\n\n**Note**: Debian bullseye (the current stable version as of 2022) packages an outdated version of Rust which does not meet flamegraph's requirements. You should use [rustup](https://rustup.rs/) to install an up-to-date version of Rust, or upgrade to Debian bookworm (the current testing version) or newer.\n\n```bash\nsudo apt install -y linux-perf\n```\n\n#### Ubuntu (x86)\n\nNot working on aarch, use a Debian distribution, or make a PR with your solution for Ubuntu\n\n```bash\nsudo apt install linux-tools-common linux-tools-generic linux-tools-`uname -r`\n```\n\n#### Ubuntu/Ubuntu MATE (Raspberry Pi)\n\n```bash\nsudo apt install linux-tools-raspi\n```\n\n#### Pop!\\_OS\n\n```bash\nsudo apt install linux-tools-common linux-tools-generic\n```\n\n## Windows\n\n#### Blondie Backend\n\nThis is enabled by default.\nWindows is supported out-of-the-box, thanks to [Nicolas Abram](https://github.com/nico-abram)'s excellent [blondie](https://github.com/nico-abram/blondie) library.\n\n#### DTrace on Windows\n\nAlternatively, one can [install DTrace on Windows](https://learn.microsoft.com/en-us/windows-hardware/drivers/devtest/dtrace). If found, flamegraph will always prefer using `dtrace` over the built-in Windows support.\n\n## Shell auto-completion\n\nAt the moment, only `flamegraph` supports auto-completion. Supported shells are `bash`, `fish`, `zsh`, `powershell` and `elvish`.\n`cargo-flamegraph` does not support auto-completion because it is not as straight-forward to implement for custom cargo subcommands. See [#153](https://github.com/flamegraph-rs/flamegraph/pull/153) for details.\n\nHow you enable auto-completion depends on your shell, e.g.\n\n```bash\nflamegraph --completions bash \u003e $XDG_CONFIG_HOME/bash_completion # or /etc/bash_completion.d/\n```\n\n## Examples\n\n```bash\n# if you'd like to profile an arbitrary executable:\nflamegraph [-o my_flamegraph.svg] -- /path/to/my/binary --my-arg 5\n\n# or if the executable is already running, you can provide the PID via `-p` (or `--pid`) flag:\nflamegraph [-o my_flamegraph.svg] --pid 1337\n\n# NOTE: By default, perf tries to compute which functions are\n# inlined at every stack frame for every sample. This can take\n# a very long time (see https://github.com/flamegraph-rs/flamegraph/issues/74).\n# If you don't want this, you can pass --no-inline to flamegraph:\nflamegraph --no-inline [-o my_flamegraph.svg] /path/to/my/binary --my-arg 5\n\n# cargo support provided through the cargo-flamegraph binary!\n# defaults to profiling cargo run --release\ncargo flamegraph\n\n# by default, `--release` profile is used,\n# but you can override this:\ncargo flamegraph --dev\n\n# if you'd like to profile a specific binary:\ncargo flamegraph --bin=stress2\n\n# if you want to pass arguments as you would with cargo run:\ncargo flamegraph -- my-command --my-arg my-value -m -f\n\n# if you want to use interesting perf or dtrace options, use `-c`\n# this is handy for correlating things like branch-misses, cache-misses,\n# or anything else available via `perf list` or dtrace for your system\ncargo flamegraph -c \"record -e branch-misses -c 100 --call-graph lbr -g\"\n\n# Run criterion benchmark\n# Note that the last --bench is required for `criterion 0.3` to run in benchmark mode, instead of test mode.\ncargo flamegraph --bench some_benchmark --features some_features -- --bench\n\ncargo flamegraph --example some_example --features some_features\n\n# Profile unit tests.\n# Note that a separating `--` is necessary if `--unit-test` is the last flag.\ncargo flamegraph --unit-test -- test::in::package::with::single::crate\ncargo flamegraph --unit-test crate_name -- test::in::package::with::multiple:crate\ncargo flamegraph --unit-test --dev test::may::omit::separator::if::unit::test::flag::not::last::flag\n\n# Profile integration tests.\ncargo flamegraph --test test_name\n```\n\n## Usage\n\n`flamegraph` is quite simple. `cargo-flamegraph` is more sophisticated:\n\n```\nUsage: cargo flamegraph [OPTIONS] [-- \u003cTRAILING_ARGUMENTS\u003e...]\n\nArguments:\n  [TRAILING_ARGUMENTS]...  Trailing arguments passed to the binary being profiled\n\nOptions:\n      --dev                            Build with the dev profile\n      --profile \u003cPROFILE\u003e              Build with the specified profile\n  -p, --package \u003cPACKAGE\u003e              package with the binary to run\n  -b, --bin \u003cBIN\u003e                      Binary to run\n      --example \u003cEXAMPLE\u003e              Example to run\n      --test \u003cTEST\u003e                    Test binary to run (currently profiles the test harness and all tests in the binary)\n      --unit-test [\u003cUNIT_TEST\u003e]        Crate target to unit test, \u003cunit-test\u003e may be omitted if crate only has one target (currently profiles the test harness and all tests in the binary; test selection can be passed as trailing arguments after `--` as separator)\n      --bench \u003cBENCH\u003e                  Benchmark to run\n      --manifest-path \u003cMANIFEST_PATH\u003e  Path to Cargo.toml\n  -f, --features \u003cFEATURES\u003e            Build features to enable\n      --no-default-features            Disable default features\n  -r, --release                        No-op. For compatibility with `cargo run --release`\n  -v, --verbose                        Print extra output to help debug problems\n  -o, --output \u003cOUTPUT\u003e                Output file [default: flamegraph.svg]\n      --open                           Open the output .svg file with default program\n      --root                           Run with root privileges (using `sudo`)\n  -F, --freq \u003cFREQUENCY\u003e               Sampling frequency in Hz [default: 997]\n  -c, --cmd \u003cCUSTOM_CMD\u003e               Custom command for invoking perf/dtrace\n      --deterministic                  Colors are selected such that the color of a function does not change between runs\n  -i, --inverted                       Plot the flame graph up-side-down\n      --reverse                        Generate stack-reversed flame graph\n      --notes \u003cSTRING\u003e                 Set embedded notes in SVG\n      --min-width \u003cFLOAT\u003e              Omit functions smaller than \u003cFLOAT\u003e pixels [default: 0.01]\n      --image-width \u003cIMAGE_WIDTH\u003e      Image width in pixels\n      --palette \u003cPALETTE\u003e              Color palette [possible values: hot, mem, io, red, green, blue, aqua, yellow, purple, orange, wakeup, java, perl, js, rust]\n      --skip-after \u003cFUNCTION\u003e          Cut off stack frames below \u003cFUNCTION\u003e; may be repeated\n      --flamechart                     Produce a flame chart (sort by time, do not merge stacks)\n      --ignore-status                  Ignores perf's exit code\n      --no-inline                      Disable inlining for perf script because of performance issues\n      --post-process \u003cPOST_PROCESS\u003e    Run a command to process the folded stacks, taking the input from stdin and outputting to stdout\n  -h, --help                           Print help\n  -V, --version                        Print version\n```\n\nThen open the resulting `flamegraph.svg` with a browser, because most image\nviewers do not support interactive svg-files.\n\n## Enabling perf for use by unprivileged users\n\nTo enable perf without running as root, you may\nlower the `perf_event_paranoid` value in proc\nto an appropriate level for your environment.\nThe most permissive value is `-1` but may not\nbe acceptable for your security needs etc...\n\n```bash\necho -1 | sudo tee /proc/sys/kernel/perf_event_paranoid\n```\n\n## Improving output when running with `--release`\n\nDue to optimizations etc... sometimes the quality\nof the information presented in the flamegraph will\nsuffer when profiling release builds.\n\nTo counter this to some extent, you may either set the following in your\n`Cargo.toml` file:\n\n```\n[profile.release]\ndebug = true\n```\n\nOr set the environment variable [CARGO_PROFILE_RELEASE_DEBUG=true](https://doc.rust-lang.org/cargo/reference/config.html#profilenamedebug).\n\nPlease note that tests, unit tests and benchmarks use the `bench` profile in release mode (see [here](https://doc.rust-lang.org/cargo/reference/profiles.html#profile-selection)).\n\n## Usage with benchmarks\n\nIn order to perf existing benchmarks, you should set up a few configs.\nSet the following in your `Cargo.toml` file to run benchmarks:\n\n```\n[profile.bench]\ndebug = true\n```\n\n## Use custom paths for perf and dtrace\n\nIf `PERF` or `DTRACE` environment variable is set,\nit'll be used as corresponding tool command.\nFor example, to use `perf` from `~/bin`:\n\n```bash\nenv PERF=~/bin/perf flamegraph /path/to/my/binary\n```\n\n## Use custom `addr2line` binary for perf\n\nIt has been reported that `addr2line` can run very slowly in several issues ([#74][i74], [#199][i199], [#294][i294]). One solution is to use [gimli-rs/addr2line](https://github.com/gimli-rs/addr2line) instead of the system `addr2line` binary. This is suggested in [this comment](https://github.com/flamegraph-rs/flamegraph/issues/74#issuecomment-1909417039), and you can follow the steps below to set it up:\n\n[i74]: https://github.com/flamegraph-rs/flamegraph/issues/74\n[i199]: https://github.com/flamegraph-rs/flamegraph/issues/199\n[i294]: https://github.com/flamegraph-rs/flamegraph/issues/294\n\n```bash\ncargo install addr2line --features=bin\n```\n\n# Systems Performance Work Guided By Flamegraphs\n\nFlamegraphs are used to visualize where time is being spent\nin your program. Many times per second, the threads in a\nprogram are interrupted and the current location in your\ncode (based on the thread's instruction pointer) is recorded,\nalong with the chain of functions that were called to get there.\nThis is called stack sampling. These samples are then\nprocessed and stacks that share common functions are\nadded together. Then an SVG is generated showing the\ncall stacks that were measured, widened to the proportion\nof all stack samples that contained them.\n\nThe **y-axis** shows the stack depth number. When looking at a\nflamegraph, the main function of your program will be closer to\nthe bottom, and the called functions will be stacked on top,\nwith the functions that they call stacked on top of them, etc...\n\nThe **x-axis** spans all of the samples. It does _not_ show the\npassing of time from left to right. The left to right ordering\nhas no meaning.\n\nThe **width** of each box shows the total time that that\nfunction is on the CPU or is part of the call stack. If a\nfunction's box is wider than others, that means that it consumes\nmore CPU per execution than other functions, or that it is\ncalled more than other functions.\n\nThe **color** of each box isn't significant, and is chosen at\nrandom.\n\nFlamegraphs are good for visualizing where the most\nexpensive parts of your program are at runtime,\nwhich is wonderful because...\n\n## Humans are terrible at guessing about performance!\n\nEspecially people who come to Rust from C and C++ will\noften over-optimize things in code that LLVM is able to\noptimize away on its own. It's always better to write\nRust in a clear and obvious way, before beginning\nmicro-optimizations, allocation-minimization, etc...\n\nLots of things that would seem like they would have terrible\nperformance are actually cheap or free in Rust. Closures\nare fast. Initialization on the stack before moving\nto a `Box` is often compiled away. Clones are often\ncompiled away. So, `clone()` away instead of fighting\nfor too long to get the compiler to be happy about\nownership!\n\nThen make a flamegraph to see if any of that was\nactually expensive.\n\n## Flamegraphs Are the Beginning, Not the End\n\nFlamegraphs show you the things that are taking up time, but they\nare a sampling technique to be used for high-level and initial\nlooks at the system under measurement. They are great for finding\nthe things to look into more closely, and often it will be\nobvious how to improve something based on its flamegraph, but\nthey are really more for choosing the target to perform optimization\non than an optimization measurement tool in itself. They are\ncoarse-grained, and difficult to diff (although\n[this may be supported soon](https://github.com/jonhoo/inferno/issues/62)).\nAlso, because flamegraphs are based on the proportion of total time\nthat something takes, if you accidentally make something else\nreally slow, it will show all other things as smaller on the flamegraph,\neven though the entire program runtime is much slower, the items you\nwere hoping to optimize look smaller.\n\nIt is a good idea to use Flamegraphs to figure out what you want to\noptimize, and then set up a measurement environment that allows\nyou to determine that an improvement has actually happened.\n\n- use flamegraphs to find a set of optimization targets\n- create benchmarks for these optimization targets, and if\n  appropriate use something like cachegrind and cg_diff to\n  [measure cpu instructions](https://github.com/spacejam/sled/blob/d521c510c3b8a7e02b8602d6db6a7701b51bd33b/hack/instructions#L26)\n  and diff them against the previous version.\n- Measuring CPU instructions is often better than measuring the time it takes\n  to run a workload in many cases, because it's possible that a\n  background task on your machine ran and caused something to slow down\n  in terms of physical time, but if you actually made an implementation\n  faster, it is likely to have a stronger correlation with reduced total\n  CPU instructions.\n- Time spent on the CPU is not the full picture, as time is spent\n  waiting for IO to complete as well, which does not get accounted\n  with tools like perf that only measure what's consuming time\n  on the CPU. Check out [Brendan Gregg's article on Off-Cpu\n  Accounting](http://www.brendangregg.com/offcpuanalysis.html)\n  for more information about this!\n\n## Performance Theory 101: Basics of Quantitative Engineering\n\n- Use realistic workloads on realistic hardware, or your data doesn't\n  necessarily correspond very much with what will be happening in production\n- All of our guesses are wrong to some extent, so we have to measure\n  the effects of our work. Often the simple code that doesn't seem\n  like it should be fast is actually way faster than code that looks\n  optimized. We need to measure our optimizations to make sure that we\n  didn't make our code both harder to read AND slower.\n- Measure before you change anything, and save the results\n  in a safe place! Many profiling tools will overwrite their old output\n  when you run them again, so make sure you take care to save the\n  data before you begin so that you can compare before and after.\n- Take measurements on a warmed up machine that isn't doing anything\n  else, and has had time to cool off from the last workload.\n  CPUs will fall asleep and drop into power-saving modes when idle,\n  and they will also throttle if they get too hot (sometimes SIMD\n  can cause things to run slower because it heats things up so much\n  that the core has to throttle).\n\n## Performance Theory 202: USE Method\n\nThe USE Method is a way to very quickly locate performance\nproblems while minimizing discovery efforts. It's more about\nfinding production issues than flamegraphs directly, but it's\na great technique to have in your toolbox if you are going to\nbe doing performance triage, and flamegraphs can be helpful\nfor identifying the components to then drill down into queue\nanalysis for.\n\nEverything in a computer can be thought of as a resource\nwith a queue in front of it, which can serve one or more\nrequests at a time. The various systems in our computers\nand programs can do a certain amount of work over time\nbefore requests start to pile up and wait in a line\nuntil they are able to be serviced.\n\nSome resources can handle more and more work without degrading\nin performance until they hit their maximum utilization point.\nNetwork devices can be thought of as working in this way to\na large extent. Other resources start to saturate long before\nthey hit their maximum utilization point, like disks.\n\nDisks (especially spinning disks, but even SSDs) will do more and\nmore work if you allow more work to queue up until they hit their\nmaximum throughput for a workload, but the latency per request\nwill go up before it hits 100% utilization because the disk will\ntake longer before it can begin servicing each request. Tuning disk\nperformance often involves measuring the various IO queue depths to\nmake sure they are high enough to get nice throughput but not so\nhigh that latency becomes undesirable.\n\nAnyway, nearly everything in our systems can be broken down\nto be analyzed based on 3 high-level characteristics:\n\n- **Utilization** is the amount of time the system under\n  measurement is actually doing useful work servicing a request,\n  and can be measured as the percent of available time spent servicing\n  requests\n- **Saturation** is when requests have to wait before being\n  serviced. This can be measured as the queue depth over time\n- **Errors** are when things start to fail, like when queues\n  are no longer able to accept any new requests - like when a TCP connection\n  is rejected because the system's TCP backlog is already full of\n  connections that have not yet been accept'ed by the userspace\n  program.\n\nThis forms the necessary background to start applying the USE Method\nto locate the performance-related issue in your complex system!\n\nThe approach is:\n\n1. Enumerate the various resources that might be behaving poorly - maybe by creating a\n   flamegraph and looking for functions that are taking more of the total runtime than expected\n1. Pick one of them\n1. (Errors) Check for errors like TCP connection failures, other IO failures, bad things in logs etc...\n1. (Utilization) Measure the utilization of the system and see if its throughput is approaching\n   the known maximum, or the point that it is known to experience saturation\n1. (Saturation) Is saturation actually happening? Are requests waiting in lines before being serviced?\n   Is latency going up while throughput is staying the same?\n\nThese probing questions serve as a piercing flashlight for\nrapidly identifying the underlying issue most of the time.\n\nIf you want to learn more about this, check out Brendan Gregg's\n[blog post](http://www.brendangregg.com/usemethod.html) on it.\nI tend to recommend that anyone who is becoming an SRE should\nmake Brendan's\n[Systems Performance](http://www.brendangregg.com/sysperfbook.html)\nbook one of the first things they read to understand how to\nmeasure these things quickly in production systems.\n\nThe USE Method derives from an area of study called\n[queue theory](https://en.wikipedia.org/wiki/Queueing_theory)\nwhich has had a huge impact on the world of computing,\nas well as many other logistical endeavors that humans\nhave undertaken.\n\n## Performance Laws\n\nIf you want to drill more into theory, know the law(s)!\n\n- [Universal Law of Scalability](http://www.perfdynamics.com/Manifesto/USLscalability.html)\n  is about the relationship between concurrency gains, queuing and coordination costs\n- [Amdahl's Law](https://en.wikipedia.org/wiki/Amdahl%27s_law)\n  is about the theoretical maximum gain that can be made for a workload by parallelization.\n- [Little's Law](https://en.wikipedia.org/wiki/Little%27s_law)\n  is a deceptively simple law with some subtle implications from queue theory\n  that allows us to reason about appropriate queue lengths for our systems\n","funding_links":[],"categories":["Rust","web shell、shellcode","17. Troubleshooting Guide","Other"],"sub_categories":["网络服务_其他","17.4 Performance Bottlenecks"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflamegraph-rs%2Fflamegraph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fflamegraph-rs%2Fflamegraph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflamegraph-rs%2Fflamegraph/lists"}