https://github.com/grudshtein/mpmc-ring

Header-only, bounded lock-free MPMC ring buffer for C++20 using per-slot sequence numbers, with a reproducible benchmark suite and latency histograms (p50/p95/p99/p99.9).
https://github.com/grudshtein/mpmc-ring

benchmarking concurrency cpp cpp20 lock-free low-latency mpmc ring-buffer

Last synced: 22 days ago
JSON representation

Header-only, bounded lock-free MPMC ring buffer for C++20 using per-slot sequence numbers, with a reproducible benchmark suite and latency histograms (p50/p95/p99/p99.9).

Host: GitHub
URL: https://github.com/grudshtein/mpmc-ring
Owner: grudshtein
License: mit
Created: 2025-09-21T17:14:14.000Z (10 months ago)
Default Branch: master
Last Pushed: 2025-10-16T03:17:27.000Z (9 months ago)
Last Synced: 2025-10-17T06:03:43.394Z (9 months ago)
Topics: benchmarking, concurrency, cpp, cpp20, lock-free, low-latency, mpmc, ring-buffer
Language: C++
Homepage:
Size: 1.89 MB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: docs/README.md
- License: LICENSE

Awesome Lists containing this project

README

          # MPMC Ring

[![CI](https://github.com/grudshtein/mpmc-ring/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/grudshtein/mpmc-ring/actions/workflows/ci.yml)

![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)

![C++20](https://img.shields.io/badge/C%2B%2B-20-00599C?logo=c%2B%2B)

![CMake >=3.21](https://img.shields.io/badge/CMake-%3E%3D3.21-064F8C?logo=cmake)

![Compilers](https://img.shields.io/badge/Compilers-MSVC%20%7C%20Clang%20%7C%20GCC-6aa84f)

![Sanitizers](https://img.shields.io/badge/Sanitizers-ASan%20%7C%20TSan%20%7C%20UBSan-orange)

MPMC Ring is a high-performance, bounded multi-producer/multi-consumer ring buffer for C++20.

It provides a work-conserving blocking (ticketed) path and a non-blocking `try_*` API,

uses atomic operations (no mutexes), and targets low-latency inter-thread messaging.

The library is header-only and ships with benchmarks and tests for reproducibility.

## Overview

Bounded multi-producer / multi-consumer ring buffer with:

- Blocking, ticketed fast path and non-blocking `try_*` API

- No mutexes; uses atomic operations. 

- Per-slot acquire/release handoff; relaxed cursors

- Thread pinning and cursor padding; reproducible bench with CSV output

## API

- `try_push(const T&) / try_push(T&&)`: non-blocking; returns `false` if full

- `push(const T&) / push(T&&)`: blocking (spins) until enqueued

- `try_pop(T&)`: non-blocking; returns `false` if empty

- `pop(T&)`: blocking (spins) until dequeued

- Template: `MpmcRing` (cursor padding toggle). Owns bounded storage.

  

Full signatures: see [`include/mpmc.hpp`](../include/mpmc.hpp).

## Build

### Visual Studio (Windows)

1. Open the folder as a **CMake Project**.

2. Set **x64 | Release**.

3. Run `tests` or `bench`.

Typical paths:

- `out/build/msvc-ninja-release/tests.exe`

- `out/build/msvc-ninja-release/bench.exe`

### MSYS2 / MinGW

```bash

cmake --preset=mingw-release

cmake --build --preset=mingw-release -j

./out/build/mingw-release/bench

```

## Benchmark defaults

**Benchmark defaults:** producers=consumers=1, capacity=65,536, mode=blocking, warmup=2,500 ms, duration=17,500 ms, 

bucket_width=5 ns, buckets=4,096, padding=on, pinning=on, large_payload=off, move_only_payload=off.

**Testbed:** Windows 11 (24H2)

**CPU:** Intel Core i7-11800H (8c/16t)

**Compiler:** MSVC 19.44 (Visual Studio 2022 17.10), `/O2 /GL`  

**Build system:** CMake 4.1.1 + Ninja 1.11.1 (Release) 

**Power plan:** Legion Balance Mode (OEM Balanced)

### Reproduce a 4p4c run

```bash

# Blocking (defaults)

./out/build/mingw-release/bench --producers 4 --consumers 4

# Non-blocking A/B

./out/build/mingw-release/bench --producers 4 --consumers 4 --blocking off

```

See `--help` for all options and defaults.

## Results (summary):

**Figure settings (used for all charts unless stated):** 

producers=consumers=4, capacity=65,536, mode=blocking, warmup=2,500 ms, duration=17,500 ms, 

bucket_width=5 ns, buckets=4,096, padding=on, pinning=on, large_payload=off, move_only_payload=off.

![Blocking vs Non-blocking Throughput (MPMC)](fig/mode_comparison.png)

![Latency vs Pinning/Padding (4p4c)](fig/latency_vs_pinning_padding.png)

![Latency vs Payload (4p4c)](fig/latency_vs_payload.png)

![Latency vs Threads](fig/latency_vs_threads.png)

![Latency Distribution (4p4c)](fig/pop_hist.png)

### Blocking vs non-blocking:

At 4p4c, blocking outperforms non-blocking by ~2–3× on this host, while tightening p99/p999.

### Pinning and padding:

Materially reduce tail latency.

### Payload sensitivity:

Large copyable payloads pay for data movement; move-only payloads keep latency close to small POD.

### Concurrency:

As threads increase, p50, p99, and p999 grow.

### Throughput notes:

- On this Windows testbed, total throughput plateaued by 4p4c despite the ticketed fast path. 

- Explicit thread pinning (affinity APIs) did not change the result. 

- Blocking is ~2–3× faster than non-blocking at 4p4c, and pinning/padding primarily tighten tails. 

- On a single-socket Linux host, higher scaling is expected.

- Next steps for increased performance: sharding the queue (N sub-rings).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/grudshtein/mpmc-ring

Awesome Lists containing this project

README