https://github.com/jmuehlig/perf-cpp

Lightweight recording and sampling of performance counters for specific code segments directly from your C++ application.
https://github.com/jmuehlig/perf-cpp

cpp cpp17 instruction-based-sampling library linux perf performance performance-analyses performance-analysis performance-counters performance-measurement performance-metrics performance-monitoring processor-architecture processor-event-based-sampling sampling system-programming

Last synced: 3 months ago
JSON representation

Lightweight recording and sampling of performance counters for specific code segments directly from your C++ application.

Host: GitHub
URL: https://github.com/jmuehlig/perf-cpp
Owner: jmuehlig
License: lgpl-3.0
Created: 2023-11-11T13:02:43.000Z (over 2 years ago)
Default Branch: dev
Last Pushed: 2026-03-24T08:14:33.000Z (3 months ago)
Last Synced: 2026-03-24T19:54:27.862Z (3 months ago)
Topics: cpp, cpp17, instruction-based-sampling, library, linux, perf, performance, performance-analyses, performance-analysis, performance-counters, performance-measurement, performance-metrics, performance-monitoring, processor-architecture, processor-event-based-sampling, sampling, system-programming
Language: C++
Homepage:
Size: 1.79 MB
Stars: 174
Watchers: 5
Forks: 17
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

          # perf-cpp: Hardware Performance Monitoring for C++

![LGPL-3.0](https://img.shields.io/github/license/jmuehlig/perf-cpp?) ![LinuxKernel->=4.0](https://img.shields.io/badge/Linux_Kernel-%3E%3D4.0-yellow)

![C++17](https://img.shields.io/badge/C++-17-00599C?logo=cplusplus) [![Build and Test](https://github.com/jmuehlig/perf-cpp/actions/workflows/build-and-test.yml/badge.svg)](https://github.com/jmuehlig/perf-cpp/actions/workflows/build-and-test.yml) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/jmuehlig/perf-cpp)

[Quick Start](#quick-start) | [How to Build](#building) | [Documentation](https://jmuehlig.github.io/perf-cpp) | [System Requirements](#system-requirements)

**perf-cpp** lets you profile specific parts of your code, *not the entire program*.

Tools like [Linux Perf](https://perfwiki.github.io/main/), [Intel® VTune™](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html), and [AMD uProf](https://www.amd.com/en/developer/uprof.html) profile everything: application startup, configuration parsing, data loading, and all your helper functions.

**perf-cpp** is different: place `start()` and `stop()` **around exactly the code you want to measure**. 

Profile one sorting algorithm. 

Measure cache misses in your hash table lookup. 

Compare two memory allocators. 

*Skip all the noise.*

## What can perf-cpp do?

Built around Linux's [*perf subsystem*](https://man7.org/linux/man-pages/man2/perf_event_open.2.html), **perf-cpp** lets you count and sample hardware events for specific code blocks:

- **Record hardware events** like `perf stat`, but only around the code you care about, *not the entire binary* ([documentation](https://jmuehlig.github.io/perf-cpp/recording/))

- **Calculate metrics** like cycles per instruction or cache miss ratios from the counters ([documentation](https://jmuehlig.github.io/perf-cpp/metrics/))

- **Read counter values without stopping** for low-overhead measurements in tight loops ([documentation](https://jmuehlig.github.io/perf-cpp/recording-live-events/))

- **Sample instructions and memory accesses** like `perf [mem] record`, but targeted at specific functions ([documentation](https://jmuehlig.github.io/perf-cpp/sampling/))

- **Export and analyze results** in your code: [write samples to CSV](https://jmuehlig.github.io/perf-cpp/sampling-export-to-csv/), [generate flame graphs](https://jmuehlig.github.io/perf-cpp/sampling-symbols-and-flamegraphs/), or [correlate memory accesses with specific data structures](https://jmuehlig.github.io/perf-cpp/sampling-memory-analysis/)

- **Mix built-in and processor-specific events** like cycles, cache misses, or vendor PMU features ([documentation](https://jmuehlig.github.io/perf-cpp/counters/))

See various **[practical examples](examples/README.md)** and the **[full documentation](https://jmuehlig.github.io/perf-cpp/)** for more details.

## Quick Start

### Record Hardware Event Statistics

Count hardware events like `perf stat`—instructions, cycles, cache misses—while your code runs.

```cpp

#include 

/// Initialize the counter

auto event_counter = perf::EventCounter{};

/// Specify hardware events to count

event_counter.add({"seconds", "instructions", "cycles", "cache-misses"});

/// Run the workload

event_counter.start();

code_to_profile(); /// <-- Statistics recorded during execution

event_counter.stop();

/// Print the result to the console

const auto result = event_counter.result();

for (const auto [event_name, value] : result)

{

    std::cout << event_name << ": " << value << std::endl;

}

```

Possible output:

```

seconds:      0.0955897 

instructions: 5.92087e+07

cycles:       4.70254e+08

cache-misses: 1.35633e+07

```

> [!NOTE]

> See the guides on **[recording event statistics](https://jmuehlig.github.io/perf-cpp/recording/)** and **[event statistics on multiple CPUs/threads](https://jmuehlig.github.io/perf-cpp/recording-parallel/)**.

> Check out the **[hardware events](https://jmuehlig.github.io/perf-cpp/counters/)** documentation for built-in and processor-specific events.

### Record Samples

Record snapshots like `perf [mem] record`—instruction pointer, CPU, timestamp—every 50,000 cycles.

```cpp

#include 

/// Create the sampler

auto sampler = perf::Sampler{};

/// Specify when a sample is recorded: every 50,000th cycle

sampler.trigger("cycles", perf::Period{50000U});

/// Specify what data is included in a sample: time, CPU ID, instruction

sampler.values()

    .timestamp(true)

    .cpu_id(true)

    .logical_instruction_pointer(true);

/// Run the workload

sampler.start();

code_to_profile(); /// <-- Samples recorded during execution

sampler.stop();

const auto samples = sampler.result();

/// Export samples to CSV.

samples.to_csv("samples.csv");

/// Or access samples programmatically.

for (const auto& record : samples)

{

    const auto timestamp = record.metadata().timestamp().value();

    const auto cpu_id = record.metadata().cpu_id().value();

    const auto instruction = record.instruction_execution().logical_instruction_pointer().value();

    

    std::cout 

        << "Time = " << timestamp << " | CPU = " << cpu_id

        << " | Instruction = 0x" << std::hex << instruction << std::dec

        << std::endl;

}

```

Possible output:

```

Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075c

Time = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75c

Time = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075c

Time = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c 

```

> [!NOTE]

> See the **[sampling guide](https://jmuehlig.github.io/perf-cpp/sampling/)** for what data you can record.

> Also check out the **[sampling on multiple CPUs/threads guide](https://jmuehlig.github.io/perf-cpp/sampling-parallel/)** for parallel sampling.

## Building

*perf-cpp* is designed as a library (static or shared) that can be linked to your application.

```bash

git clone https://github.com/jmuehlig/perf-cpp.git

cd perf-cpp

cmake . -B build

cmake --build build

```

> [!NOTE]

> See the **[building guide](https://jmuehlig.github.io/perf-cpp/build/)** for CMake integration and build options.

## Documentation

The full documentation is available at **[jmuehlig.github.io/perf-cpp](https://jmuehlig.github.io/perf-cpp/)**.

See also: **[Examples](examples/README.md)** | **[Changelog](CHANGELOG.md)**

## System Requirements

- *Clang* / *GCC* with support for **C++17** features.

- *CMake* version **3.10** or higher.

- *Linux Kernel* **4.0** or newer (note that some features need a newer Kernel).

- `perf_event_paranoid` setting: Adjust as needed to allow access to performance counters (see the [perf paranoid](https://jmuehlig.github.io/perf-cpp/perf-paranoid/) documentation).

- *Python3*, if you make use of [processor-specific hardware event generation](https://jmuehlig.github.io/perf-cpp/build/#auto-generating-events-at-compile-time).

## Contribute and Contact

We welcome contributions and feedback.

For feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.

Alternatively, you can email me: `jan.muehlig@tu-dortmund.de`.

---

## Further PMU-related Projects

Other profiling tools:

- [PAPI](https://github.com/icl-utk-edu/papi) monitors CPU counters, GPUs, I/O, and more.

- [Likwid](https://github.com/RRZE-HPC/likwid) is a set of command-line tools for benchmarking with an extensive [wiki](https://github.com/RRZE-HPC/likwid/wiki).

- [PerfEvent](https://github.com/viktorleis/perfevent) is a lightweight wrapper for performance counters.

- Intel's [Instrumentation and Tracing Technology](https://github.com/intel/ittapi) lets you control [Intel VTune Profiler](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html) from your code.

- Want to go lower-level? Use [perf_event_open](https://man7.org/linux/man-pages/man2/perf_event_open.2.html) directly.

## Resources about (Perf-) Profiling

Papers and articles about profiling (feel free to add your own via pull request):

### Academic Papers

- [Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis](https://soramichi.jp/pdf/ROSS2017.pdf) (2017)

- [Analyzing memory accesses with modern processors](https://dl.acm.org/doi/abs/10.1145/3399666.3399896) (2020)

- [Precise Event Sampling on AMD Versus Intel: Quantitative and Qualitative Comparison](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10068807&tag=1) (2023)

- [Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE](https://arxiv.org/html/2410.01514v1) (2024)

- [Breaking the Cycle - A Short Overview of Memory-Access Sampling Differences on Modern x86 CPUs](https://dl.acm.org/doi/pdf/10.1145/3736227.3736241) (2025)

### Blog Posts

- [C2C - False Sharing Detection in Linux Perf](https://joemario.github.io/blog/2016/09/01/c2c-blog/) (2016)

- [PMU counters and profiling basics.](https://easyperf.net/blog/2018/06/01/PMU-counters-and-profiling-basics) (2018)

- [Detect false sharing with Data Address Profiling.](https://easyperf.net/blog/2019/12/17/Detecting-false-sharing-using-perf) (2019)

- [Advanced profiling topics. PEBS and LBR.](https://easyperf.net/blog/2018/06/08/Advanced-profiling-topics-PEBS-and-LBR) (2018)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jmuehlig/perf-cpp

Awesome Lists containing this project

README