{"id":47699929,"url":"https://github.com/jmuehlig/perf-cpp","last_synced_at":"2026-04-02T17:04:36.371Z","repository":{"id":206837650,"uuid":"717422791","full_name":"jmuehlig/perf-cpp","owner":"jmuehlig","description":"Lightweight recording and sampling of performance counters for specific code segments directly from your C++ application.","archived":false,"fork":false,"pushed_at":"2026-03-24T08:14:33.000Z","size":1872,"stargazers_count":174,"open_issues_count":0,"forks_count":17,"subscribers_count":5,"default_branch":"dev","last_synced_at":"2026-03-24T19:54:27.862Z","etag":null,"topics":["cpp","cpp17","instruction-based-sampling","library","linux","perf","performance","performance-analyses","performance-analysis","performance-counters","performance-measurement","performance-metrics","performance-monitoring","processor-architecture","processor-event-based-sampling","sampling","system-programming"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jmuehlig.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-11-11T13:02:43.000Z","updated_at":"2026-03-23T12:41:22.000Z","dependencies_parsed_at":"2023-11-24T14:30:29.629Z","dependency_job_id":"b07de2c4-feef-4977-9622-f5ba7b536588","html_url":"https://github.com/jmuehlig/perf-cpp","commit_stats":null,"previous_names":["jmuehlig/perf-cpp"],"tags_count":27,"template":false,"template_full_name":null,"purl":"pkg:github/jmuehlig/perf-cpp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmuehlig%2Fperf-cpp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmuehlig%2Fperf-cpp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmuehlig%2Fperf-cpp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmuehlig%2Fperf-cpp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jmuehlig","download_url":"https://codeload.github.com/jmuehlig/perf-cpp/tar.gz/refs/heads/dev","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmuehlig%2Fperf-cpp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31311062,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T12:59:32.332Z","status":"ssl_error","status_checked_at":"2026-04-02T12:54:48.875Z","response_time":89,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","cpp17","instruction-based-sampling","library","linux","perf","performance","performance-analyses","performance-analysis","performance-counters","performance-measurement","performance-metrics","performance-monitoring","processor-architecture","processor-event-based-sampling","sampling","system-programming"],"created_at":"2026-04-02T17:04:34.478Z","updated_at":"2026-04-02T17:04:36.360Z","avatar_url":"https://github.com/jmuehlig.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# perf-cpp: Hardware Performance Monitoring for C++\n![LGPL-3.0](https://img.shields.io/github/license/jmuehlig/perf-cpp?) ![LinuxKernel-\u003e=4.0](https://img.shields.io/badge/Linux_Kernel-%3E%3D4.0-yellow)\n![C++17](https://img.shields.io/badge/C++-17-00599C?logo=cplusplus) [![Build and Test](https://github.com/jmuehlig/perf-cpp/actions/workflows/build-and-test.yml/badge.svg)](https://github.com/jmuehlig/perf-cpp/actions/workflows/build-and-test.yml) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/jmuehlig/perf-cpp)\n\n[Quick Start](#quick-start) | [How to Build](#building) | [Documentation](https://jmuehlig.github.io/perf-cpp) | [System Requirements](#system-requirements)\n\n**perf-cpp** lets you profile specific parts of your code, *not the entire program*.\n\nTools like [Linux Perf](https://perfwiki.github.io/main/), [Intel® VTune™](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html), and [AMD uProf](https://www.amd.com/en/developer/uprof.html) profile everything: application startup, configuration parsing, data loading, and all your helper functions.\n**perf-cpp** is different: place `start()` and `stop()` **around exactly the code you want to measure**. \nProfile one sorting algorithm. \nMeasure cache misses in your hash table lookup. \nCompare two memory allocators. \n*Skip all the noise.*\n\n## What can perf-cpp do?\nBuilt around Linux's [*perf subsystem*](https://man7.org/linux/man-pages/man2/perf_event_open.2.html), **perf-cpp** lets you count and sample hardware events for specific code blocks:\n\n- **Record hardware events** like `perf stat`, but only around the code you care about, *not the entire binary* ([documentation](https://jmuehlig.github.io/perf-cpp/recording/))\n- **Calculate metrics** like cycles per instruction or cache miss ratios from the counters ([documentation](https://jmuehlig.github.io/perf-cpp/metrics/))\n- **Read counter values without stopping** for low-overhead measurements in tight loops ([documentation](https://jmuehlig.github.io/perf-cpp/recording-live-events/))\n- **Sample instructions and memory accesses** like `perf [mem] record`, but targeted at specific functions ([documentation](https://jmuehlig.github.io/perf-cpp/sampling/))\n- **Export and analyze results** in your code: [write samples to CSV](https://jmuehlig.github.io/perf-cpp/sampling-export-to-csv/), [generate flame graphs](https://jmuehlig.github.io/perf-cpp/sampling-symbols-and-flamegraphs/), or [correlate memory accesses with specific data structures](https://jmuehlig.github.io/perf-cpp/sampling-memory-analysis/)\n- **Mix built-in and processor-specific events** like cycles, cache misses, or vendor PMU features ([documentation](https://jmuehlig.github.io/perf-cpp/counters/))\n\nSee various **[practical examples](examples/README.md)** and the **[full documentation](https://jmuehlig.github.io/perf-cpp/)** for more details.\n\n## Quick Start\n### Record Hardware Event Statistics\nCount hardware events like `perf stat`—instructions, cycles, cache misses—while your code runs.\n\n```cpp\n#include \u003cperfcpp/event_counter.hpp\u003e\n\n/// Initialize the counter\nauto event_counter = perf::EventCounter{};\n\n/// Specify hardware events to count\nevent_counter.add({\"seconds\", \"instructions\", \"cycles\", \"cache-misses\"});\n\n/// Run the workload\nevent_counter.start();\ncode_to_profile(); /// \u003c-- Statistics recorded during execution\nevent_counter.stop();\n\n/// Print the result to the console\nconst auto result = event_counter.result();\nfor (const auto [event_name, value] : result)\n{\n    std::cout \u003c\u003c event_name \u003c\u003c \": \" \u003c\u003c value \u003c\u003c std::endl;\n}\n```\n\nPossible output:\n```\nseconds:      0.0955897 \ninstructions: 5.92087e+07\ncycles:       4.70254e+08\ncache-misses: 1.35633e+07\n```\n\n\u003e [!NOTE]\n\u003e See the guides on **[recording event statistics](https://jmuehlig.github.io/perf-cpp/recording/)** and **[event statistics on multiple CPUs/threads](https://jmuehlig.github.io/perf-cpp/recording-parallel/)**.\n\u003e Check out the **[hardware events](https://jmuehlig.github.io/perf-cpp/counters/)** documentation for built-in and processor-specific events.\n\n### Record Samples\nRecord snapshots like `perf [mem] record`—instruction pointer, CPU, timestamp—every 50,000 cycles.\n\n```cpp\n#include \u003cperfcpp/sampler.hpp\u003e\n\n/// Create the sampler\nauto sampler = perf::Sampler{};\n\n/// Specify when a sample is recorded: every 50,000th cycle\nsampler.trigger(\"cycles\", perf::Period{50000U});\n\n/// Specify what data is included in a sample: time, CPU ID, instruction\nsampler.values()\n    .timestamp(true)\n    .cpu_id(true)\n    .logical_instruction_pointer(true);\n\n/// Run the workload\nsampler.start();\ncode_to_profile(); /// \u003c-- Samples recorded during execution\nsampler.stop();\n\nconst auto samples = sampler.result();\n\n/// Export samples to CSV.\nsamples.to_csv(\"samples.csv\");\n\n/// Or access samples programmatically.\nfor (const auto\u0026 record : samples)\n{\n    const auto timestamp = record.metadata().timestamp().value();\n    const auto cpu_id = record.metadata().cpu_id().value();\n    const auto instruction = record.instruction_execution().logical_instruction_pointer().value();\n    \n    std::cout \n        \u003c\u003c \"Time = \" \u003c\u003c timestamp \u003c\u003c \" | CPU = \" \u003c\u003c cpu_id\n        \u003c\u003c \" | Instruction = 0x\" \u003c\u003c std::hex \u003c\u003c instruction \u003c\u003c std::dec\n        \u003c\u003c std::endl;\n}\n```\n\nPossible output:\n```\nTime = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075c\nTime = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75c\nTime = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075c\nTime = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c \n```\n\n\u003e [!NOTE]\n\u003e See the **[sampling guide](https://jmuehlig.github.io/perf-cpp/sampling/)** for what data you can record.\n\u003e Also check out the **[sampling on multiple CPUs/threads guide](https://jmuehlig.github.io/perf-cpp/sampling-parallel/)** for parallel sampling.\n\n## Building\n*perf-cpp* is designed as a library (static or shared) that can be linked to your application.\n\n```bash\ngit clone https://github.com/jmuehlig/perf-cpp.git\ncd perf-cpp\ncmake . -B build\ncmake --build build\n```\n\n\u003e [!NOTE]\n\u003e See the **[building guide](https://jmuehlig.github.io/perf-cpp/build/)** for CMake integration and build options.\n\n## Documentation\n\nThe full documentation is available at **[jmuehlig.github.io/perf-cpp](https://jmuehlig.github.io/perf-cpp/)**.\n\nSee also: **[Examples](examples/README.md)** | **[Changelog](CHANGELOG.md)**\n\n## System Requirements\n- *Clang* / *GCC* with support for **C++17** features.\n- *CMake* version **3.10** or higher.\n- *Linux Kernel* **4.0** or newer (note that some features need a newer Kernel).\n- `perf_event_paranoid` setting: Adjust as needed to allow access to performance counters (see the [perf paranoid](https://jmuehlig.github.io/perf-cpp/perf-paranoid/) documentation).\n- *Python3*, if you make use of [processor-specific hardware event generation](https://jmuehlig.github.io/perf-cpp/build/#auto-generating-events-at-compile-time).\n\n## Contribute and Contact\nWe welcome contributions and feedback.\nFor feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.\n\nAlternatively, you can email me: `jan.muehlig@tu-dortmund.de`.\n\n---\n\n## Further PMU-related Projects\nOther profiling tools:\n\n- [PAPI](https://github.com/icl-utk-edu/papi) monitors CPU counters, GPUs, I/O, and more.\n- [Likwid](https://github.com/RRZE-HPC/likwid) is a set of command-line tools for benchmarking with an extensive [wiki](https://github.com/RRZE-HPC/likwid/wiki).\n- [PerfEvent](https://github.com/viktorleis/perfevent) is a lightweight wrapper for performance counters.\n- Intel's [Instrumentation and Tracing Technology](https://github.com/intel/ittapi) lets you control [Intel VTune Profiler](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html) from your code.\n- Want to go lower-level? Use [perf_event_open](https://man7.org/linux/man-pages/man2/perf_event_open.2.html) directly.\n\n## Resources about (Perf-) Profiling\nPapers and articles about profiling (feel free to add your own via pull request):\n\n### Academic Papers\n- [Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis](https://soramichi.jp/pdf/ROSS2017.pdf) (2017)\n- [Analyzing memory accesses with modern processors](https://dl.acm.org/doi/abs/10.1145/3399666.3399896) (2020)\n- [Precise Event Sampling on AMD Versus Intel: Quantitative and Qualitative Comparison](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10068807\u0026tag=1) (2023)\n- [Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE](https://arxiv.org/html/2410.01514v1) (2024)\n- [Breaking the Cycle - A Short Overview of Memory-Access Sampling Differences on Modern x86 CPUs](https://dl.acm.org/doi/pdf/10.1145/3736227.3736241) (2025)\n\n### Blog Posts\n- [C2C - False Sharing Detection in Linux Perf](https://joemario.github.io/blog/2016/09/01/c2c-blog/) (2016)\n- [PMU counters and profiling basics.](https://easyperf.net/blog/2018/06/01/PMU-counters-and-profiling-basics) (2018)\n- [Detect false sharing with Data Address Profiling.](https://easyperf.net/blog/2019/12/17/Detecting-false-sharing-using-perf) (2019)\n- [Advanced profiling topics. PEBS and LBR.](https://easyperf.net/blog/2018/06/08/Advanced-profiling-topics-PEBS-and-LBR) (2018)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjmuehlig%2Fperf-cpp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjmuehlig%2Fperf-cpp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjmuehlig%2Fperf-cpp/lists"}