{"id":17965233,"url":"https://github.com/tokenrove/extrospect-beam","last_synced_at":"2025-03-25T06:31:13.551Z","repository":{"id":147613117,"uuid":"65306617","full_name":"tokenrove/extrospect-beam","owner":"tokenrove","description":"Tools for live extrospection of the Erlang BEAM VM — WARNING: early alpha","archived":false,"fork":false,"pushed_at":"2017-02-09T22:30:04.000Z","size":40,"stargazers_count":23,"open_issues_count":6,"forks_count":3,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-19T09:40:48.934Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tokenrove.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-08-09T15:32:08.000Z","updated_at":"2021-06-11T21:25:57.000Z","dependencies_parsed_at":null,"dependency_job_id":"f8c3b3f4-e597-40b2-9a61-b8d6fe3372b7","html_url":"https://github.com/tokenrove/extrospect-beam","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenrove%2Fextrospect-beam","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenrove%2Fextrospect-beam/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenrove%2Fextrospect-beam/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenrove%2Fextrospect-beam/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tokenrove","download_url":"https://codeload.github.com/tokenrove/extrospect-beam/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245413707,"owners_count":20611353,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-29T12:10:35.095Z","updated_at":"2025-03-25T06:31:13.544Z","avatar_url":"https://github.com/tokenrove.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# think-outside-the-beam\n\nThis is a collection of tools for unobtrusive introspection of a\nrunning Erlang VM under Linux.  It uses Linux-specific interfaces\n(perf events, `process_vm_readv(2)`) to avoid having to stop the process\nwith `ptrace(2)`.\n\nBecause these tools are necessarily approximate (see the WARNING\nsection), they should be used as a way to discover new directions for\nmore specific investigation, not as a source of truth.\n\n\n## WARNING\n\nThese tools make significant assumptions about the internals of the\nErlang VM they are introspecting.  They probably won't work on even\nslightly different versions of the VM.  They are also very specific to\nx86-64 and Linux presently.\n\nCompatible version: Erlang OTP 18.3 x86-64\n\nThese tools are also necessarily inaccurate.  First, perf itself may\nsample in a biased fashion; secondly, these tools read additional data\nfrom the VM, and may receive inconsistent or garbled views of the data\ntherein.  Use them only for developing hypotheses.\n\n## EXTRA WARNING\n\nThis software has only been used on a handful of fairly homogeneous\nsystems, by the author.  It is alpha software that is almost certainly\nbroken in many subtle ways.\n\n\n## To build\n\nTry:\n\n```\n./build.sh\n```\n\nYou will need:\n - [ninja](https://ninja-build.org/) (\u003e= 1.5.1)\n - [meson](http://mesonbuild.com/) (\u003e= 0.33.0)\n - [liblzma](https://github.com/kobolabs/liblzma)\n - [gawk](https://www.gnu.org/software/gawk/)\n - [bison](https://www.gnu.org/software/bison/)\n - [flex](http://flex.sourceforge.net/)\n\nWe will eventually supply a script that verifies the constants chosen\nhere are consistent with the running BEAM internals.\n\nYou will have the best luck if you've built your Erlang system with at\nleast these flags:\n\n```\n-fvar-tracking-assignments -ggdb -g3 -gdwarf-4 -Wl,--build-id\n```\n\n\n## Standalone Tools\n\nMost of these tools by default look at all threads associated with a\nVM, but can be asked to isolate only a single PID.\n\nThose that have some amount of skid (accuracy of measurement) can be\nasked to print estimates of how far off they're likely to be\n(`--skid-summary`).\n\nAt some point, generating the perf.map will be done automatically, but\nfor the moment, it must be done manually.  So before running\n`erlang-sample`, you'll need to run `erlang-write-perf-map PID` where\n`PID` is the PID of your Erlang VM.  If it fails because of missing\nsymbols, you'll probably need to rebuild Erlang with debugging\noptions.\n\n\n### erlang-sample\n\nBy default, lists most frequently seen Erlang function calls as\nsampled from a running VM.\n\n#### `--pstack`\n\nPrints the stack of each running process on each scheduler.  By\ndefault, only those schedulers that are running.  Options to wait for\neach scheduler; print only Erlang stack.\n\n#### `--blame`\n\nFor a given native function (like `erts_garbage_collect` or\n`copy_struct`), report those Erlang functions occurring most frequently\nin the stack trace for that function.\n\n### erlang-heapsample (coming soon)\n\nPrint heap information about currently running processes in Erlang VM.\n\n\n## Integrations\n### perf\n\nSee `vendor/perf`.  Still extremely nascent.\n\n### kcov\n\nComing soon, hopefully.\n\n\n## Questions and How to Answer Them (WIP)\n\n### What are the hottest Erlang functions?\n\nRun `erlang-sample` for a reasonable period of time, probably with the `--only-erlang` option.\n\n\n### What is allocating long-lived memory?\n\n\n### How much are NIFs impacting scheduling?\n\n\n### Where do expensive deep copies occur?\n\nTry `erlang-sample --blame copy_struct`.\n\n(when `copy_struct` is seen, dump the stack of the process involved)\n\n\n### How much RSS/vsize is lost to fragmentation?\n\n- compare maps and mbcs carriers, sbcs carriers to actual memory allocated\n\nThere may be some metric we can come up with for this, too.\n\n(total free - largest free block) / total free\n\nSee also `recon_alloc:fragmentation` for the in-VM approach to this.\n\n\nWe should also be able to compare memory usage to actual vsize of\nanonymous rw pages.\n\n\n### How is my workload distributed across schedulers?  Across CPUs?\n\nGraph processes seen and percentages of other stuff, per scheduler\n\n\n## How this works\n\n### Erlang stacktraces using perf event sampling and process_vm_readv\n\nThere are two mechanisms by which we get information from the VM\nprocess: perf event sampling, which is done by the kernel\nsynchronously (as far as I know), and direct reading of the VM\nprocess's memory using `process_vm_readv`, which necessarily happens\nasynchronously and can present an inconsistent picture of the VM's\nstate.\n\n(See \"Why not ptrace or /proc/PID/mem?\" elsewhere in this\ndocumentation, if you just asked yourself that question.)\n\nWe're mostly concerned with getting samples when the native IP is\ninside `process_main`, although it doesn't hurt to get the most\naccurate backtrace possible even if we're in some child of\n`process_main` like `erts_garbage_collect`.\n\nIn `process_main`, we have a couple of variables that are particularly\nof interest.  There's `c_p`, which points to the current process.  We\ncan read all kinds of useful information from that structure, but\n(except with some dirty tricks that aren't generally applicable) the\ntime between a perf sample being made and us reading this information\ncould be very large (see other discussions in this documentation on\nskid about that).\n\nSo, if we can, we also want to sample `I` and `E`.  `I` points to the\ncurrent instruction, and `E` points to the top of the stack.  If we\ncan get all of them, we can do a pretty good job of validating that\nthe trace we read from `c_p` is accurate with regards the perf sample.\n\n\n### Tell me about the dirty tricks that aren't generally applicable\n\nThis is probably one hack too far, but consider if we get perf to\nsample the stack of the following bit of code:\n\n```\n    spy_pid = syscall(__NR_gettid);\n    sched_setscheduler(spy_pid, SCHED_IDLE,\n                       \u0026(struct sched_param){.sched_priority=20});\n    asm volatile(\"\" ::: \"memory\");\n    /* this should probably be nanosleep, but since we destroyed our\n     * stack forever, we'd have to put the arguments in static storage\n     * or similar.  too much hassle for this prototype.  sched_yield\n     * shouldn't be _so_ bad if there are other jobs to run. */\n    asm volatile (\"forever:\\n\"\n                  \"movq %0, %%rsp\\n\"\n                  \"movl %1, %%eax\\n\"\n                  \"int $0x80\\n\"\n                  \"jmp forever\\n\"\n                  : : \"r\" (spy_target), \"r\" (__NR_sched_yield) : \"rsp\");\n    __builtin_unreachable();\n```\n\nThis allows us to sample memory from wherever we point `spy_target`.\n(For example, we could write a NIF that allows us to create these spy\nthreads in the VM and then read and write their `spy_target` with\n`process_vm_{readv,writev}`.)  So we might be able to use this to\nsample a single process with a higher level of accuracy than before,\nbut it would require some serious juggling and machinations that don't\nseem to be worth it.\n\nAt this point, if you're considering doing this, you probably just\nwant to extend perf's sampling mechanism in the kernel.  SystemTap or\nthe new BPF facilities probably are better places to aim for this.\n\n\n### Why not ptrace or /proc/PID/mem?\n\nIt's fairly well-known that in order to `ptrace`, we have to stop the\ntraced process.  (Disclaimer of ignorance: I know that `PTRACE_SEIZE`\nexists as a Linuxism, but I don't know how much you can do in that\nstate without invoking `PTRACE_INTERRUPT`.)\n\nIt's less well-known that in order to read from `/proc/PID/mem`, the\nsame is true: the process must be stopped.\n\nMost of the systems I was interested in applying these techniques to\ncannot abide being stopped even briefly.\n\n\n## Troubleshooting\n\nIn general, the `--pstack` mode for `erlang-sample` is useful for\ntroubleshooting, since it prints full stack traces at a time and one\ncan easily see many common problems (such as all traces being a single\nentry deep, or no Erlang functions ever appearing).\n\n\n### `erlang-sample` can't find a register location for `c_p`\n\nIf the problem is just that the register information is more complex\nthan a single location (you can check with `dwarfdump`, `readelf` or\nsimilar), it's mostly a matter of making `erlang-sample` smarter.\n\nIf the location isn't there at all, though, (i.e. gdb gives the\ndreaded `(optimized out)` message when you do `info address c_p` when\nstopped in `process_main`) you can find this and other important\nregisters by looking at the disassembly of `process_main`.\n\nFor example, one can run `objdump -d -S\n/usr/local/lib/erlang/erts-7.3.1/beam.smp | less` (replace with a\nsuitable path to your copy of `beam.smp` or `beam`), search for\n`process_main`, then within `process_main`, look for the disassembly\nimmediately following macros like `SWAPIN`.  Chances are, you'll see\nsomething like this:\n\n```\n        SWAPIN;\n  43e01c:       4d 8b 55 50             mov    0x50(%r13),%r10\n  43e020:       ff 23                   jmpq   *(%rbx)\n  43e022:       49 8d 95 c8 02 00 00    lea    0x2c8(%r13),%rdx\n```\n\nFrom that, it seems pretty likely that `r13` is `c_p`, and `rbx` is\n`I`.  We could be wrong, of course, but we can test it out with:\n\n```\nerlang-sample --force-c_p-register=r13 --force-I-register=rbx --pstack -d 1 PID\n```\n\nand see if the results are at all sensible.\n\n\n### `erlang-sample` doesn't seem to be able to unwind (no backtraces)\n\nUnfortunately at the moment we rely on our slighly-hacked vendored\ncopy of `elfutils`, which causes as many problems as it solves.  Try\n\n```\nLD_LIBRARY_PATH=vendor/elfutils/backends ./build/dist/erlang-sample --pstack PID\n```\n\nand see if it's any better.  `elfutils` may not be able to find the\nsuitable `EBL` backend, which it always loads dynamically even if\n`libdw` was statically linked into the program.\n\n\n## Open Problems\n\n### How much skid is there in a given measurement, and how can we reduce it?\n\nSee skid measurement options.\n\nWhen we receive an actual process_main sample, or something where\nthat's in the call stack, we actually have more information than it\nmight seem.\n\nWe can sample E and I when they're in registers.  There's a bunch of\nother corroborating evidence.  For example, we can look at what opcode\nwe were executing in process_main, and try to correlate it with\nopcodes in the source of the processes on that scheduler.\n\n\n### Can we avoid depending on `-ggdb` builds by writing the perf map from the VM itself?\n\nWe still need to know where `c_p`, `E`, `I`, and so on live, which\nrequires either DWARF or manual inspection of the source (or perhaps\nsome automated reverse engineering).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftokenrove%2Fextrospect-beam","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftokenrove%2Fextrospect-beam","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftokenrove%2Fextrospect-beam/lists"}