https://github.com/quinnjr/blackbox-bench

A lightweight, zero-dependency Python benchmarking library with decorator and context manager APIs, CLI, and run comparison
https://github.com/quinnjr/blackbox-bench

benchmark benchmarking cli developer-tools optimization performance profiling python testing timing

Last synced: 19 days ago
JSON representation

A lightweight, zero-dependency Python benchmarking library with decorator and context manager APIs, CLI, and run comparison

Host: GitHub
URL: https://github.com/quinnjr/blackbox-bench
Owner: quinnjr
License: mit
Created: 2026-02-13T15:35:35.000Z (5 months ago)
Default Branch: develop
Last Pushed: 2026-05-20T16:44:57.000Z (about 2 months ago)
Last Synced: 2026-05-20T21:55:27.020Z (about 2 months ago)
Topics: benchmark, benchmarking, cli, developer-tools, optimization, performance, profiling, python, testing, timing
Language: Python
Homepage: https://quinnjr.github.io/blackbox-bench/
Size: 236 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

          # blackbox-bench

A lightweight Python microbenchmarking library with a Rust core (PyO3 + maturin). Designed as a "criterion for Python" — minimal harness overhead, statistically rigorous output, and a small CLI that drops into CI.

```python

import blackbox_bench

bench = blackbox_bench.Bench()

@bench.benchmark

def hash_1kb():

    import hashlib

    hashlib.sha256(b"x" * 1024).digest()

bench.run()

bench.report()                # human-readable table

bench.report(format="json", path="results.json")

```

```

blackbox-bench results

─────────────────────────────────────────────────────────────────────────────────

Name        Mean   Median  StdDev    Min    Max  Ops/sec          CI 95%  Outliers

─────────────────────────────────────────────────────────────────────────────────

hash_1kb  1.4 µs   1.4 µs  12.0 ns 1.4 µs 1.5 µs  710,221  [1.4 µs, 1.4 µs]        3

─────────────────────────────────────────────────────────────────────────────────

```

## Features

- **Per-batch timing with auto-batch-sizing** — sub-microsecond functions are batched until each sample takes ≥5 µs, well above the timer resolution floor.

- **Per-iteration overhead measurement and subtraction** — the harness probes itself at startup so reported nanoseconds attribute time to your code, not blackbox-bench.

- **Bootstrap 95% confidence intervals** for the mean, plus Tukey or MAD outlier detection.

- **`blackbox_bench.black_box(value)`** — opaque pass-through that the optimiser can't see through.

- **`bench.iter_batched(setup, routine)`** — setup runs once per sample (untimed); routine runs the timed batch.

- **`throughput=`** — `@bench.benchmark(throughput=1024)` reports MB/s alongside ops/sec.

- **`params=[...]`** — parameterised benchmarks; one result per parameter.

- **Opt-in HDR histograms** — `Bench(histogram=True)` attaches a percentile-queryable `HdrHistogram` to each result.

- **`blackbox_bench.compare(baseline_json, current_json)`** — classifies each row as `unchanged` / `regressed` / `improved` / `new` / `removed` using CI overlap, not raw `change_pct`.

- **Four reporters** — table, JSON, self-contained HTML (with inline SVG sparklines), JUnit-compatible XML (with a raw alternative).

- **`blackbox-bench run --profile`** — wraps each benchmark in [`py-spy`](https://github.com/benfred/py-spy) and emits SVG flamegraphs alongside the results.

## Install

```bash

pip install blackbox-bench

```

Pre-built abi3 wheels are published for cpython 3.10 / 3.11 / 3.12 / 3.13 on linux (x86_64 + aarch64), macOS (x86_64 + aarch64), and windows x86_64. Building from source requires a Rust toolchain.

Optional extras:

```bash

pip install blackbox-bench[profile]   # adds py-spy for `blackbox-bench run --profile`

```

## Usage

### Decorator API

```python

import blackbox_bench

bench = blackbox_bench.Bench(

    warmup=5,

    target_time_ns=1_000_000_000,

    outlier_method="tukey",        # or "mad" / "none"

    overhead_subtract=True,

    histogram=False,

)

@bench.benchmark

def quick(): sum(range(100))

@bench.benchmark(name="hash_kb", throughput=1024, params=[10, 100, 1000])

def hashing(n):

    import hashlib

    hashlib.sha256(b"x" * n).digest()

results = bench.run()

bench.report(format="html", path="results.html")

```

### `iter_batched` for setup-isolated timing

```python

import random

@bench.benchmark

def sort_random():

    return bench.iter_batched(

        setup=lambda: random.sample(range(1_000), 1_000),

        routine=lambda xs: sorted(xs),

    )

```

`setup` runs once per *sample* and isn't timed; `routine` is the timed call inside the batch.

### Context manager

```python

with bench.measure("payload_build"):

    payload = build_huge_payload()

bench.run()  # appends measured contexts to results

```

### Module-level decorator + CLI

```python

# bench_hashing.py

import blackbox_bench

@blackbox_bench.benchmark

def sha256_1kb():

    import hashlib

    hashlib.sha256(b"x" * 1024).digest()

```

```bash

blackbox-bench run bench_hashing.py --warmup 5 --iterations 100

blackbox-bench run benches/ --format html --output report.html

blackbox-bench run benches/ --save baseline.json

blackbox-bench compare baseline.json current.json   # CI-classified diff

```

### Comparing runs

```python

import json

report = blackbox_bench.compare(

    open("baseline.json").read(),

    open("current.json").read(),

)

for row in report.rows:

    print(row.name, row.classification, row.change_pct)

```

`ComparisonReport.format("xml")` emits JUnit with `` on regressed rows — drop the file into Jenkins/GitHub Actions test reporters.

## Why Rust under the hood

The harness has to stay small relative to the user's function:

- **Tight sampling loop** — `Instant::now()` and the per-batch call dispatch are raw FFI (`PyObject_CallNoArgs` / `PyObject_CallObject`), skipping PyO3's higher-level wrappers inside the timed window.

- **GIL released during stats** — bootstrap CI, Tukey/MAD, and the histogram run inside `py.detach(...)` so other Python threads aren't blocked while blackbox-bench crunches its samples.

- **Reused scratch buffers** — `median`, `tukey`, `mad`, and `bootstrap_ci_mean` share two `Vec`s owned by the `Runner`; a 100-benchmark suite still allocates only once for the lot.

The criterion benchmarks at `benches/rust_internals.rs` measure these primitives directly. `benches/bench_dogfood.py` measures the assembled harness end-to-end (the empty-pass benchmark should report ~0–1 ns after overhead subtraction).

## Migrating from 0.1.0

See [MIGRATION.md](MIGRATION.md). Most v0.1.0 code only needs an import swap; the one surface that changed without alias is `Bench.report(json_output=True)` → `Bench.report(format="json")`. For exact v0.1.0 semantics:

```python

from blackbox_bench.legacy import Bench, benchmark, BenchmarkResult

```

The legacy shim is removed in v1.1.

## License

MIT — see [LICENSE](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/quinnjr/blackbox-bench

Awesome Lists containing this project

README