{"id":49032544,"url":"https://github.com/bhope/hedge","last_synced_at":"2026-04-28T02:00:54.914Z","repository":{"id":346966013,"uuid":"1191402807","full_name":"bhope/hedge","owner":"bhope","description":"Adaptive hedged requests for Go. Cut your p99 latency with zero configuration. Based on Google's \"The Tail at Scale\" paper.","archived":false,"fork":false,"pushed_at":"2026-04-15T01:25:12.000Z","size":571,"stargazers_count":170,"open_issues_count":9,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-15T02:22:58.507Z","etag":null,"topics":["concurrency","go","golang","golang-library","grpc","hedging","microservices","performance","tail-latency"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bhope.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-25T07:54:54.000Z","updated_at":"2026-04-15T00:09:36.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/bhope/hedge","commit_stats":null,"previous_names":["bhope/hedge"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/bhope/hedge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bhope%2Fhedge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bhope%2Fhedge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bhope%2Fhedge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bhope%2Fhedge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bhope","download_url":"https://codeload.github.com/bhope/hedge/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bhope%2Fhedge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32362782,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-27T20:07:02.737Z","status":"online","status_checked_at":"2026-04-28T02:00:07.250Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["concurrency","go","golang","golang-library","grpc","hedging","microservices","performance","tail-latency"],"created_at":"2026-04-19T10:00:35.122Z","updated_at":"2026-04-28T02:00:54.909Z","avatar_url":"https://github.com/bhope.png","language":"Go","funding_links":[],"categories":["Networking"],"sub_categories":["HTTP Clients"],"readme":"# hedge\n\n[![Go Report Card](https://goreportcard.com/badge/github.com/bhope/hedge)](https://goreportcard.com/report/github.com/bhope/hedge) [![Go Reference](https://pkg.go.dev/badge/github.com/bhope/hedge.svg)](https://pkg.go.dev/github.com/bhope/hedge) [![codecov](https://codecov.io/gh/bhope/hedge/branch/main/graph/badge.svg)](https://codecov.io/gh/bhope/hedge) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n\nIn a fan-out architecture with 100 backends, 63% of your requests hit at least one straggler (1 − 0.99¹⁰⁰).\n\nhedge learns per-host latency distributions using DDSketch, fires a backup request when the primary exceeds its estimated p90, and caps hedge rate with a token bucket to prevent load amplification during outages. Result: p99 drops from 64ms to 17ms - a 74% reduction - with ~9% overhead. Zero configuration. Supports streaming workloads including LLM inference (measures time-to-first-token, not just headers).\n\nBased on Dean \u0026 Barroso, [\"The Tail at Scale\"](https://research.google/pubs/the-tail-at-scale/) (CACM 2013) and Masson et al., [\"DDSketch\"](https://arxiv.org/abs/2004.08604) (VLDB 2019).\n\n## Quick Start\n\n```sh\ngo get github.com/bhope/hedge\n```\n\n```go\nclient := \u0026http.Client{Transport: hedge.New(http.DefaultTransport)}\nresp, err := client.Get(\"https://api.example.com/data\")\n```\n\n## Table of Contents\n\n- [Quick Start](#quick-start)\n- [Evaluation](#evaluation)\n- [How it works](#how-it-works)\n- [Why not a static threshold?](#why-not-a-static-threshold)\n- [Streaming / LLM inference](#streaming--llm-inference)\n- [gRPC support](#grpc-support)\n- [Configuration](#configuration)\n- [Observability](#observability)\n- [References](#references)\n- [Contributing](#contributing)\n- [License](#license)\n\n\n## Evaluation\n\n50,000 requests against a simulated backend with lognormal base latency (mean=5ms, σ=2ms) and 5% straggler probability (10× multiplier).\n\n| Configuration    |  p50  |  p90  |   p95  |   p99  |  p999   | Overhead |\n|------------------|-------|-------|--------|--------|---------|----------|\n| No hedging       | 5.0ms | 8.9ms | 17.1ms | 64.3ms | 104.5ms |    0.0%  |\n| Static 10ms      | 5.0ms | 8.9ms | 13.1ms | 17.4ms |  46.5ms |    7.4%  |\n| Static 50ms      | 5.0ms | 8.9ms | 18.9ms | 54.7ms |  60.2ms |    2.1%  |\n| Adaptive (hedge) | 5.0ms | 8.9ms | 12.3ms | 17.0ms |  59.4ms |    8.9%  |\n\nAdaptive matches the best hand-tuned static threshold at p99 (17.0ms vs 17.4ms) with no manual configuration. Static 50ms is too conservative at p95 (18.9ms vs 12.3ms); static 10ms over-hedges at p999 relative to adaptive. In production, where latency shifts with load and deployments, a fixed threshold goes stale; adaptive does not.\n\n![Latency percentiles across configurations](eval.png)\n\nReproduce: `cd benchmark/simulate \u0026\u0026 go run .`\n\n\n## How it works\n\n**1. DDSketch quantile estimator.** Each target host gets a `WindowedSketch`, a pair of DDSketches rotating every 30 seconds. DDSketch uses logarithmic bucket mapping to provide relative-error guarantees: any quantile estimate is within ±1% of the true value, regardless of the value's magnitude. This matters for latency: a 1% error on a 10ms p90 is 0.1ms, not a fixed absolute error. The rolling window ensures the sketch tracks current conditions, not the entire history.\n\n**2. Adaptive trigger.** On each request, the transport queries the sketch for the configured percentile (default p90). If the primary has not responded by that deadline, a backup request is fired using a child context derived from the caller's. Whichever response arrives first is returned; the loser's context is cancelled and its body drained to return the connection to the pool.\n\n**3. Token bucket budget.** Hedges are rate-limited by a token bucket that refills at `estimatedRPS × budgetPercent / 100` tokens per second (defaults: 100 RPS, 10%). During genuine outages, when every request stalls and the p90 estimate collapses to the minimum delay, the bucket drains within seconds and hedging stops, preventing the load-doubling spiral that would deepen the incident.\n\n\n## Why not a static threshold?\n\nA fixed 10ms threshold calibrated against today's traffic will be wrong tomorrow. Latency shifts with load, GC pauses, cold JVM instances after a deploy, and time of day. At off-peak, the real p90 might be 3ms, so a 10ms threshold never fires and provides no benefit. At peak, the real p90 might be 40ms, so a 10ms threshold hedges 90% of requests and doubles backend load. You would need per-service, per-environment thresholds updated continuously. The sketch updates on every completed request; the threshold is always current.\n\n\n## Streaming / LLM inference\n\nMost HTTP servers delay headers until the response is ready, so time-to-first-header (TTFH) ≈ total response latency and is a valid hedge signal. LLM inference servers differ: they commonly flush a `200 OK` before the first token is generated, then stream the response body. On these servers, TTFH is near-zero for every request; only time-to-first-token (TTFT), the latency to the first readable byte, reflects actual inference cost.\n\nhedge records latency at first body byte via `ttftBody.Read`, not at header receipt. This gives the sketch the correct signal for prefill-disaggregated architectures where the header and the first token arrive tens to hundreds of milliseconds apart.\n\n**Benchmark results** -- streaming server (200 OK flushed immediately; first token delayed):\n\n| Mode               |  p50   |  p90    |  p95    |  p99    | LatencyEstimate(p80) |\n|--------------------|--------|---------|---------|---------|----------------------|\n| TTFB (old, broken) |  1.2ms |  2.0ms  |  2.3ms  |  3.0ms  | 1.6ms (near-zero: wrong signal) |\n| TTFT (new, fixed)  | 16.4ms | 199.4ms | 216.0ms | 243.5ms | 25.9ms (actual inference latency) |\n\nA hedge calibrated from the TTFB sketch would use a ~1ms delay and fire a redundant request on virtually every call. Calibrated from TTFT, it fires only on the slow 20% (cache misses).\n\n**End-to-end latency** -- gateway server (TTFH ≈ TTFT, hedge fires while blocked on headers):\n\n| Mode                          |  p50   |  p90    |  p95    |  p99    | Overhead |\n|-------------------------------|--------|---------|---------|---------|----------|\n| No hedging                    | 16.4ms | 199.6ms | 217.5ms | 245.6ms |    0.0%  |\n| TTFB-miscalibrated (1ms)      | 14.7ms |  19.4ms |  23.6ms | 198.7ms |  100.0%  |\n| TTFT-calibrated (p80 ≈ 47ms)  | 16.4ms |  61.6ms | 182.5ms | 224.2ms |   17.0%  |\n\nThe miscalibrated transport halves p90 but doubles load and barely touches p99. The TTFT-calibrated transport targets cache misses specifically, achieving meaningful p90 reduction at 17% overhead.\n\n![TTFB vs TTFT sketch signal on a streaming server](eval_streaming.png)\n\nReproduce: `cd benchmark/streaming \u0026\u0026 go run .`\n\n\n## gRPC support\n\n```go\nconn, err := grpc.NewClient(target,\n    grpc.WithTransportCredentials(insecure.NewCredentials()),\n    grpc.WithUnaryInterceptor(hedge.NewUnaryClientInterceptor(\n        hedge.WithEstimatedRPS(500),\n        hedge.WithBudgetPercent(10),\n    )),\n)\n```\n\nAll options are supported. Per-target latency tracking uses `cc.Target()` as the host key.\n\n\n## Configuration\n\n| Option | Type | Default | Description |\n|--------|------|---------|-------------|\n| `WithPercentile(q)` | float64 | 0.90 | Sketch quantile used as hedge trigger |\n| `WithMaxHedges(n)` | int | 1 | Maximum concurrent hedge requests per call |\n| `WithBudgetPercent(p)` | float64 | 10.0 | Max hedge rate as percent of total traffic |\n| `WithEstimatedRPS(r)` | float64 | 100 | Expected requests per second; sets token bucket capacity |\n| `WithMinDelay(d)` | time.Duration | 1ms | Floor on the hedge delay |\n| `WithStats(s)` | `**Stats` | nil | Pointer to receive the live `Stats` struct |\n\n\n## Observability\n\n```go\nvar stats *hedge.Stats\n\nclient := \u0026http.Client{\n    Transport: hedge.New(http.DefaultTransport,\n        hedge.WithStats(\u0026stats),\n    ),\n}\n\n// After requests:\nsnap := stats.Snapshot()\nfmt.Printf(\"total=%d hedged=%d hedge_wins=%d budget_exhausted=%d\\n\",\n    snap.TotalRequests,\n    snap.HedgedRequests,\n    snap.HedgeWins,\n    snap.BudgetExhausted,\n)\nfmt.Printf(\"hedge_rate=%.2f\\n\", stats.HedgeRate())\n```\n\n`Stats` fields are `atomic.Int64` and safe to read concurrently. `Snapshot()` takes a consistent point-in-time copy. `HedgeRate()` returns `HedgedRequests / TotalRequests`.\n\n\n## References\n\n- Jeffrey Dean and Luiz André Barroso. [\"The Tail at Scale.\"](https://research.google/pubs/the-tail-at-scale/) *Communications of the ACM*, 56(2):74–80, February 2013.\n- Charles Masson, Jee E. Rim, and Homin K. Lee. [\"DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees.\"](https://arxiv.org/abs/2004.08604) *Proceedings of the VLDB Endowment*, 12(12):2195–2205, 2019.\n\n\n## Contributing\n\nContributions are welcome! Please open an issue to discuss your idea before submitting a PR.\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.\n\n\n## License\n\nhedge is released under the [MIT License](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbhope%2Fhedge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbhope%2Fhedge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbhope%2Fhedge/lists"}