{"id":27386416,"url":"https://github.com/awelzel/zeek-spy","last_synced_at":"2025-04-13T17:24:35.347Z","repository":{"id":147842030,"uuid":"236863634","full_name":"awelzel/zeek-spy","owner":"awelzel","description":"Sampling Profiler for Zeek","archived":false,"fork":false,"pushed_at":"2020-02-22T15:37:29.000Z","size":152,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2023-03-26T14:15:41.290Z","etag":null,"topics":["pprof","profiler","ptrace","sampling","zeek"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/awelzel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-01-28T23:17:22.000Z","updated_at":"2024-06-19T12:41:58.136Z","dependencies_parsed_at":"2023-05-27T16:30:30.236Z","dependency_job_id":null,"html_url":"https://github.com/awelzel/zeek-spy","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awelzel%2Fzeek-spy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awelzel%2Fzeek-spy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awelzel%2Fzeek-spy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awelzel%2Fzeek-spy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/awelzel","download_url":"https://codeload.github.com/awelzel/zeek-spy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248751261,"owners_count":21155858,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pprof","profiler","ptrace","sampling","zeek"],"created_at":"2025-04-13T17:24:34.793Z","updated_at":"2025-04-13T17:24:35.332Z","avatar_url":"https://github.com/awelzel.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# zeek-spy - Sampling Profiler for Zeek\n\n**experimental - proof of concept**\n\n[![Build Status](https://travis-ci.org/awelzel/zeek-spy.svg?branch=master)](https://travis-ci.org/awelzel/zeek-spy)\n\n## How it works\n\n`zeek-spy` attaches to a running `zeek` process using `ptrace(2)` and reads\nthe `call_stack` and `g_frame_stack` memory.\n\nUsing the referenced `CallInfo`, `Func`, `Stmt`, `Frame` and `Location` objects,\na sample (call stack) of Zeek script land is created including function names,\nfilenames and line numbers.\n\nUpon termination `zeek-spy` writes all samples as gziped \"profile.proto\" file.\nThis file can then be analyzed with [pprof][1].\n\nThe idea was prompted by [rbspy][2] and [py-spy][3].\n\n## Compatibility / Limitations\n\nThis was developed against Zeek 3.0.1, compiled with GCC 8.3.0 on `x86_64`.\nConcretely, the [binary Zeek packages][4] for Debian 10 in version 3.0.1\nshould be working.\n\nAnything else will (very very) likely not work. The current code uses\nhard-coded offsets related to the memory layout of `std::vector`, `std::string`,\n`CallInfo`, `Frame` and more. They may be just wrong for a different Zeek\nand/or compiler version. C++ does not make this easier, either.\n\nMemory locations and offsets were determined with `elf`, `gdb`, `dwarfdump`\nand sometimes just counting.\n\n\n## Usage\n\n### Sample profile\n\nTo look at a provided profile inside this repo.\n\n    $ pprof -http=localhost:9999 -ignore=empty_call_stack -trim=false -filefunctions ./sample-profiles/macdc2012.pb.gz\n\n### Profiling a running `zeek` process\n\nThis assumes `pgrep` finds just a single process.\n\n    $ sudo zeek-spy -pid $(pgrep zeek) -hz 250 -profile ./zeek.pb.gz\n    ...\n    Ctrl+C\n    \n    # zeek.pb.gz is in profile.proto format (https://github.com/google/pprof/tree/master/proto)\n\n    # Analyze with pprof\n    $ pprof -ignore=empty_call_stack -trim=false -lines  ./zeek.pb.gz\n    Type: samples\n    Time: Jan 30, 2020 at 1:35pm (CET)\n    Duration: 31.55s, Total samples = 7661\n    Entering interactive mode (type \"help\" for commands, \"o\" for options)\n    Active filters:\n       ignore=empty_call_stack\n    Showing nodes accounting for 4036, 52.68% of 7661 total\n          flat  flat%   sum%        cum   cum%\n           549  7.17%  7.17%        549  7.17%  Log::__write\n           436  5.69% 12.86%        436  5.69%  sha256_hash /home/awelzel/projects/zeek/scripts/base/init-bare.zeek:5171\n           269  3.51% 16.37%        269  3.51%  sha1_hash /home/awelzel/projects/zeek/scripts/base/init-bare.zeek:5171\n           245  3.20% 19.57%        245  3.20%  md5_hash /home/awelzel/projects/zeek/scripts/base/init-bare.zeek:5171\n           132  1.72% 21.29%        132  1.72%  connection_state_remove /home/awelzel/projects/zeek/scripts/base/protocols/http/main.zeek:329\n           119  1.55% 22.84%        119  1.55%  connection_state_remove /home/awelzel/projects/zeek/scripts/base/protocols/sip/main.zeek:296\n            83  1.08% 23.93%         83  1.08%  schedule_me scripts/slow_dns.zeek:32\n            78  1.02% 24.94%         78  1.02%  connection_state_remove /home/awelzel/projects/zeek/scripts/base/protocols/dce-rpc/main.zeek:218\n            73  0.95% 25.90%        356  4.65%  dns_request scripts/slow_dns.zeek:12\n            72  0.94% 26.84%         72  0.94%  connection_state_remove /home/awelzel/projects/zeek/scripts/base/protocols/ftp/main.zeek:290\n            60  0.78% 27.62%         60  0.78%  connection_state_remove /home/awelzel/projects/zeek/scripts/base/protocols/socks/main.zeek:119\n            ...\n\n\n    # Or browse the profile interactively in a browser\n    $ pprof -http=localhost:9999 -ignore=empty_call_stack -trim=false -filefunctions ./zeek.pb.gz\n\n\n### Performance Impact\n\nThe `zeek` process is stopped while `zeek-spy` takes a sample. A separate\n`ptrace-attach` happens for every sample. Performance may degrade for very\nhigh and possibly moderate sampling frequencies. The default is 100 hz.\n\n`zeek-spy` outputs an estimation of the overhead while running\n(see the `-stats` option).\n\n`zeek-spy` is very performance naive, too. There are various ways to improve\nsampling performance. Starting from caching \"constant\" memory locations,\nswitching to `process_vm_readv(2)` and most likely many Go specific tweaks.\n\n\n### Profiling processing of a PCAP file\n\nThis is a bit of a crutch and basically the same as above, but nicer for testing:\n\n    $ timeout 10 /opt/zeek/bin/zeek -r ./pcaps/maccdc2012_00000.pcap \u0026 sleep 0.2 \u0026\u0026 ./zeek-spy -pid $(pgrep zeek) -hz 250 -profile ./macdc2012.pb.gz -stats 1s\n    2020/02/22 16:33:40 Using pid=31072, hz=250 period=4ms (4.000000 ms) profile=./zeek.pb.gz\n    2020/02/22 16:33:40 Profiling ZeekProcess{Pid=31072, Exe=/opt/zeek/bin/zeek, LoadAddr=0x55f9a6665000, CallStackAddr=0x55f9a73e2680, FrameStackAddr=0x55f9a73e2470 VersionAddr=0x55f9a73dd330}\n    2020/02/22 16:33:40 Found Zeek version '3.0.1'\n    2020/02/22 16:33:41 [STATS] elapsed=1.00s samples=134 (250 total) skipped=0 frequency=250.0hz overhead=2.76% (27.578542ms)\n    2020/02/22 16:33:42 [STATS] elapsed=2.00s samples=337 (500 total) skipped=0 frequency=250.0hz overhead=3.49% (34.902129ms)\n    2020/02/22 16:33:43 [STATS] elapsed=3.00s samples=560 (750 total) skipped=0 frequency=250.0hz overhead=4.08% (40.829118ms)\n    ...\n    1331901122.870000 received termination signal\n    2020/02/22 16:33:50 [STATS] elapsed=10.00s samples=2097 (2500 total) skipped=0 frequency=250.0hz overhead=4.48% (44.825587ms)\n    2020/02/22 16:33:50 [WARN] wait() failed for 31072: process exited\n    2020/02/22 16:33:50 [WARN] Could not detach from process: no such process\n    2020/02/22 16:33:50 [WARN] Failed to spy, exiting (process exited)\n    2020/02/22 16:33:50 Writing protobuf...\n    2020/02/22 16:33:50 Done.\n\n\n### pprof flags\n\nAs event handlers all have the same function name and do not live in a module,\ndepending on the `granularity` setting of `pprof` the output will vary.\n\nUsing `lines` or `filefunctions` gives reasonable results.\n\n    $ pprof -ignore=empty_call_stack -trim=false -lines   ./zeek.pb.gz\n\nThe `-ignore=empty_call_stack` is used to filter out all samples where\nthe `call_stack` was empty. This is useful when there's only very little\ntraffic and the `empty_call_stack` samples dominate the profile.\n\n\n[1]: https://github.com/google/pprof\n[2]: https://github.com/rbspy/rbspy\n[3]: https://github.com/benfred/py-spy\n[4]: https://software.opensuse.org//download.html?project=security%3Azeek\u0026package=zeek\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawelzel%2Fzeek-spy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fawelzel%2Fzeek-spy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawelzel%2Fzeek-spy/lists"}