An open API service indexing awesome lists of open source software.

https://github.com/goceleris/benchmarks

Official reproducible benchmark suite for comparing Go HTTP server throughput and latency. Tests production frameworks against theoretical maximum implementations using raw syscalls.
https://github.com/goceleris/benchmarks

benchmarks docker latency load-testing performance throughput

Last synced: 2 months ago
JSON representation

Official reproducible benchmark suite for comparing Go HTTP server throughput and latency. Tests production frameworks against theoretical maximum implementations using raw syscalls.

Awesome Lists containing this project

README

          

# Celeris Benchmarks

Reproducible HTTP server benchmarks on dedicated bare-metal hardware with 10GbE point-to-point networking. Compares production Go frameworks against theoretical maximum performance using raw Linux syscalls.

## Why This Exists

Most HTTP benchmarks run on shared VMs with noisy neighbors, variable network hops, and throttled I/O — making results unreliable and non-reproducible. This suite runs on dedicated bare-metal machines with direct 10GbE links, automated kernel tuning, and CPU pinning, so every release gets consistent, comparable numbers.

We measure three categories of servers:
- **Baseline**: Production Go frameworks (Gin, Fiber, Echo, Chi, Iris, Hertz, FastHTTP, stdlib)
- **Celeris**: The [Celeris](https://github.com/goceleris/celeris) HTTP engine with io_uring, epoll, and adaptive backends
- **Theoretical**: Raw epoll/io_uring implementations showing the syscall performance ceiling

## Hardware

Three dedicated Minisforum mini PCs connected via 10GbE point-to-point links:

| Machine | Role | CPU | Cores/Threads | RAM | Network |
|---------|------|-----|---------------|-----|---------|
| MS-A2 | Client (self-hosted runner) | AMD Ryzen 9 9955HX (Zen 5) | 16C/32T | 32 GB DDR5 | 10GbE SFP+ |
| MS-A2 | x86 Server | AMD Ryzen 7 7745HX (Zen 4) | 8C/16T | 32 GB DDR5 | 10GbE SFP+ |
| MS-R1 | ARM64 Server | CIX CP8180 | 12C/12T | 64 GB LPDDR5 | Dual 10GbE RJ45 (RTL8127) |

All machines run Debian 13 (Trixie) with kernel 6.12+ for full io_uring support. The client machine is the GitHub Actions self-hosted runner that orchestrates everything via SSH.

## Benchmark Types

### Standard Level (7 types, ~66 min per architecture)

| Type | Endpoint | What It Tests |
|------|----------|---------------|
| `simple` | `GET /` | Plain text — pure framework overhead |
| `json` | `GET /json` | JSON serialization |
| `path` | `GET /users/:id` | Path parameter extraction + routing |
| `body` | `POST /upload` | 2 KB request body read |
| `headers` | `GET /users/:id` | Realistic API headers (~850 bytes: JWT, cookies, tracing) |
| `json-64k` | `GET /json-64k` | 64 KB JSON response — I/O throughput, efficiency metric |
| `churn` | `GET /` | New TCP connection per request — tests `accept()`, `SO_REUSEPORT` |

### Full Level (15 types, ~142 min per architecture)

Adds a **concurrency sweep** that scales connections from 1 to 10,000 on the `simple` endpoint:

```
simple@1 simple@10 simple@50 simple@100 simple@500 simple@1000 simple@5000 simple@10000
```

This produces scaling curves that show where goroutine-based frameworks plateau and where event-loop servers keep climbing.

## Servers Tested

### Production Frameworks (Baseline)

| Server | Protocols | Framework |
|--------|-----------|-----------|
| stdhttp | H1, H2C, Hybrid | Go stdlib `net/http` |
| gin | H1, H2C, Hybrid | [Gin](https://github.com/gin-gonic/gin) |
| echo | H1, H2C, Hybrid | [Echo](https://github.com/labstack/echo) |
| chi | H1, H2C, Hybrid | [Chi](https://github.com/go-chi/chi) |
| iris | H1, H2C, Hybrid | [Iris](https://github.com/kataras/iris) |
| hertz | H1, H2C, Hybrid | [Hertz](https://github.com/cloudwego/hertz) |
| fiber | H1 | [Fiber](https://github.com/gofiber/fiber) (fasthttp-based) |
| fasthttp | H1 | [FastHTTP](https://github.com/valyala/fasthttp) |

### Celeris

| Server | Protocols | Engine |
|--------|-----------|--------|
| celeris-iouring | H1, H2C, Hybrid | io_uring (Linux 5.10+) |
| celeris-epoll | H1, H2C, Hybrid | epoll (Linux 2.6+) |
| celeris-adaptive | H1, H2C, Hybrid | Runtime engine selection |

Each engine runs with three resource profiles: `latency`, `throughput`, and `balanced`.

### Theoretical Maximum

| Server | Protocols | Implementation |
|--------|-----------|----------------|
| epoll | H1, H2C, Hybrid | Raw epoll with SO_REUSEPORT, SIMD header parsing, zero-alloc response path |
| iouring | H1, H2C, Hybrid | io_uring with SQPOLL, multishot accept, linked SQEs |

## Dashboard & Results

Results are published to [goceleris/docs](https://github.com/goceleris/docs) as dashboard-format JSON (schema v4.0), keyed by Celeris version:

- `results/latest/{arch}.json` — most recent run
- `results/{version}/{arch}.json` — per-version archive

Dashboard data includes:
- **RPS and latency percentiles** (P50, P75, P90, P99, P999, P9999) per server per benchmark type
- **Concurrency scaling curves** — RPS at each concurrency level (full level only)
- **Efficiency metric** — RPS / Server CPU% per server, normalizing across core counts
- **System metrics** — server CPU, memory RSS, GC pauses (Go servers only)
- **Timeseries** — per-second RPS and P99 latency snapshots

## Running Benchmarks

Benchmarks are designed to run through GitHub Actions workflows. The self-hosted runner on the client machine handles everything: SSH into servers, deploy binaries, tune kernels, run benchmarks, collect results.

### Via GitHub Actions (Primary Method)

- **Release benchmarks**: Trigger automatically on every release, or manually via the `benchmark.yml` workflow dispatch. Releases run at `full` level (includes concurrency sweep).
- **PR benchmarks**: Add the `benchmark` label to a pull request. Runs at `standard` level.

### Local Development

For local development and testing (not full benchmarks):

```bash
# Build server and bench binaries
mage build

# Run a quick local smoke test (5s per server, localhost)
mage benchmarkQuick
```

## CI/CD

| Workflow | Trigger | Level | Timeout |
|----------|---------|-------|---------|
| `benchmark.yml` | Release (auto) or manual dispatch | `full` on release, configurable on manual | 480 min |
| `benchmark-pr.yml` | PR with `benchmark` label | `standard` | 240 min |

Both workflows SSH to the bare-metal servers, deploy the server binary, run benchmarks, and collect results. Release runs also publish to the docs repository and trigger a site rebuild.

## Project Structure

```
cmd/bench/ Benchmark runner CLI (specs, runner, checkpoint)
cmd/server/ Server binary (all implementations + control daemon)
servers/
baseline/ Production frameworks (gin, echo, chi, iris, etc.)
celeris/ Celeris HTTP engine
theoretical/ Raw epoll/iouring implementations
common/ Shared types, payload generators, SIMD helpers
internal/
dashboard/ Dashboard JSON format (schema v4.0)
metrics/ Prometheus metrics definitions
version/ Version info
config/
hosts.json Machine addresses and hardware metadata
```

## Contributing

### Requirements

- **Go 1.24+**: [Download](https://go.dev/dl/)
- **Mage**: `go install github.com/magefile/mage@latest`

### Development

```bash
mage check # deps + lint + vet + build
mage test # run tests
mage fmt # format code
```

### Adding a Server

1. Create a package under `servers/baseline/` (or `servers/theoretical/`)
2. Implement all benchmark endpoints: `GET /`, `GET /json`, `GET /json-1k`, `GET /json-64k`, `GET /users/:id`, `POST /upload`
3. Register the server type in `cmd/server/main.go`
4. Add to the server list in `cmd/bench/main.go`

### Adding a Benchmark Type

1. Add the endpoint to all server implementations
2. Add a `BenchmarkSpec` entry in `cmd/bench/main.go`
3. Update dashboard format if new fields are needed (`internal/dashboard/format.go`)

## License

Apache 2.0