{"id":50225629,"url":"https://github.com/ameyaborkar/throttlekit","last_synced_at":"2026-06-09T23:00:45.931Z","repository":{"id":360282528,"uuid":"1249388004","full_name":"AmeyaBorkar/throttlekit","owner":"AmeyaBorkar","description":"Rate limiting for Node and the web: GCRA-default, one transform across in-memory/Redis/Postgres (proven bit-identical), and two-tier token leasing with a formally-verified overshoot bound independent of fleet size. Sync API, multi-dimensional checks, fixed-memory DDoS sketches.","archived":false,"fork":false,"pushed_at":"2026-06-04T14:54:00.000Z","size":2118,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-04T15:17:50.576Z","etag":null,"topics":["adaptive-concurrency","backpressure","cloudflare-workers","edge","express","fetch","gcra","leaky-bucket","lua","rate-limiter","rate-limiting","redis","sliding-window","throttle","token-bucket","typescript"],"latest_commit_sha":null,"homepage":"https://www.npmjs.com/package/throttlekit","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AmeyaBorkar.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-25T16:40:24.000Z","updated_at":"2026-06-04T14:42:41.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/AmeyaBorkar/throttlekit","commit_stats":null,"previous_names":["ameyaborkar/throttlekit"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/AmeyaBorkar/throttlekit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmeyaBorkar%2Fthrottlekit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmeyaBorkar%2Fthrottlekit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmeyaBorkar%2Fthrottlekit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmeyaBorkar%2Fthrottlekit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AmeyaBorkar","download_url":"https://codeload.github.com/AmeyaBorkar/throttlekit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmeyaBorkar%2Fthrottlekit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34129072,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adaptive-concurrency","backpressure","cloudflare-workers","edge","express","fetch","gcra","leaky-bucket","lua","rate-limiter","rate-limiting","redis","sliding-window","throttle","token-bucket","typescript"],"created_at":"2026-05-26T15:01:28.621Z","updated_at":"2026-06-09T23:00:40.916Z","avatar_url":"https://github.com/AmeyaBorkar.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ThrottleKit\n\n[![npm](https://img.shields.io/npm/v/throttlekit.svg)](https://www.npmjs.com/package/throttlekit)\n[![CI](https://github.com/AmeyaBorkar/throttlekit/actions/workflows/ci.yml/badge.svg)](https://github.com/AmeyaBorkar/throttlekit/actions/workflows/ci.yml)\n[![types: included](https://img.shields.io/npm/types/throttlekit.svg)](https://www.npmjs.com/package/throttlekit)\n[![node: \u003e=18](https://img.shields.io/node/v/throttlekit.svg)](https://www.npmjs.com/package/throttlekit)\n[![license: MIT](https://img.shields.io/npm/l/throttlekit.svg)](./LICENSE)\n\n**Rate limiting for Node and the web — one small core, from a sub-microsecond in-process check to a distributed fleet with a _proven_ overshoot bound.**\n\nPick an algorithm, a backend (in-memory, Redis, or Postgres), and your framework — the limiting logic stays the same. ThrottleKit rests on three ideas: **algorithms** are pure functions of time, **storage** is one atomic primitive, and **adapters** are thin glue. That separation is what lets the *same* config run as an allocation-free in-process check or atomically across a cluster.\n\nWhat sets it apart is the distributed story: it isn't \"a Redis counter and hope.\" The two-tier leasing path has a **formally-verified overshoot bound** that, with window-coupling, is **independent of fleet size** — model-checked, not asserted. It's young (`0.x`) but heavily tested: 389 tests, a dual-path (JS↔Lua) conformance suite, and a TLA⁺-checked protocol. The benchmarks below are reproducible on your hardware, and include the cases where ThrottleKit *loses*.\n\n**Jump to:** [Install](#install) · [Quick start](#quick-start) · [Strategies](#choosing-a-strategy) · [Frameworks](#frameworks-and-the-edge) · [Distributed \u0026 provable](#going-distributed) · [Multi-dimensional](#limiting-on-several-axes) · [Backpressure](#backpressure-and-shaping) · [DDoS](#surviving-a-flood) · [Fairness](#overload-and-fairness) · [Performance](#performance) · [Migrating](#migrating) · [How it's tested](#how-its-tested)\n\n---\n\n## Highlights\n\n- **Good defaults.** GCRA out of the box — smooth pacing, controlled bursts, one timestamp of state per key.\n- **Seven strategies.** GCRA, token bucket, fixed \u0026 sliding window, sliding-window log, leaky-bucket shaping, and adaptive concurrency.\n- **Three backends, identical decisions.** In-memory, Redis (one atomic Lua round trip), and Postgres (no Redis required) — proven bit-identical by the conformance suite.\n- **Six frameworks + the edge.** Express, Web `fetch` (Cloudflare/Deno/Bun/Next edge), Hono, Next.js, Fastify, Koa — each its own subpath.\n- **Provably scales out.** Two-tier token leasing and multi-region share one engine with a *formally-verified* overshoot bound (TLA⁺/TLC); opt into `lease.windowCoupled` and that bound becomes **independent of fleet size**.\n- **A synchronous API.** `checkSync` — uncommon among JS limiters — for hot paths that don't want an `await`.\n- **Operable.** Standards headers (IETF draft + RFC 9651 + legacy), proxy-correct IP keys, IPv6 aggregation, HMAC key hashing, OpenTelemetry, and zero-config analytics.\n- **TypeScript-first.** ESM + CJS, 11 entry points, strict types, all peer deps optional.\n\n---\n\n## Install\n\n```sh\nnpm i throttlekit\n```\n\nPeer dependencies are **optional** — install only the ones for the adapters you use (`ioredis` for `throttlekit/redis`, `pg` for `throttlekit/postgres`, `express`, `@opentelemetry/api`, …). The Web `fetch` adapter (`throttlekit/fetch`) needs none — it uses the global `Request`/`Response` (Node 18+, Cloudflare Workers, Deno, Bun).\n\n---\n\n## Quick start\n\nIn-memory GCRA — no infrastructure, works out of the box:\n\n```ts\nimport { rateLimit, gcra } from \"throttlekit\";\n\nconst limiter = rateLimit({\n  // 100 requests per minute, with an instantaneous burst allowance of 20.\n  strategy: gcra({ limit: 100, periodMs: 60_000, burst: 20 }),\n  // `store` defaults to a fresh in-process MemoryStore.\n});\n\nconst decision = await limiter.check(userId); // cost defaults to 1\nif (!decision.allowed) {\n  // 429 with Retry-After: Math.ceil(decision.retryAfterMs / 1000)\n  throw new Error(`rate limited; retry in ${decision.retryAfterMs}ms`);\n}\n```\n\nEvery check returns an immutable `Decision`:\n\n```ts\ninterface Decision {\n  allowed: boolean;     // permit or reject\n  limit: number;        // effective ceiling (burst capacity or window quota)\n  remaining: number;    // whole units left before the next rejection (never negative)\n  resetAt: number;      // epoch-ms when the limiter is fully replenished\n  retryAfterMs: number; // 0 when allowed; otherwise how long to wait\n}\n```\n\nWith the in-memory store you also get a synchronous, zero-`await` fast path, and a `cost` for weighting:\n\n```ts\nconst d = limiter.checkSync(userId); // MemoryStore only; throws on async stores\nawait limiter.check(userId, 5);      // this request costs 5 units\n```\n\nCheck **many keys at once** — each at one consistent timestamp, returned in order:\n\n```ts\nconst decisions = await limiter.checkMany([ip, userId, apiKey]); // Decision[] in input order\nconst all = limiter.checkManySync(keys);                         // MemoryStore: one loop, no promises\n```\n\nOn an async store the checks fire concurrently — and collapse to a single round trip on clients that pipeline same-tick commands (node-redis, or `ioredis` with `enableAutoPipelining`). Runnable versions of every section below live in [`examples/`](./examples).\n\n---\n\n## Choosing a strategy\n\nPick one and pass it to `rateLimit({ strategy })`:\n\n| Goal | Strategy |\n|---|---|\n| Good general default — tiny state, smooth pacing, controlled bursts | **`gcra({ limit, periodMs, burst? })`** |\n| Client-friendly \"tokens remaining\", controlled bursts | `tokenBucket({ capacity, refillPerSec })` |\n| Cheapest coarse cap (allows up to 2× across a boundary, by design) | `fixedWindow({ limit, windowMs })` |\n| Near-exact rolling window at any limit, bounded memory | `slidingWindow({ limit, windowMs, buckets? })` |\n| Exact \"N in the last X\" at low/moderate limits | `slidingWindowLog({ limit, windowMs })` |\n| Shape/queue outbound calls to a fixed rate (delays, doesn't reject) | `leakyBucket({ ratePerSec, maxQueueMs })` |\n| Protect a service from overload when the right rate is unknown | `adaptiveConcurrency({ ... })` |\n\n- `gcra` stores a single number (the theoretical arrival time); `burst` defaults to `limit`.\n- `slidingWindow` — `buckets` defaults to 10 (error ≈ 1/buckets of the window); `buckets: 1` is the classic estimator. `slidingWindowLog` is exact but O(limit) memory/key.\n- `leakyBucket` and `adaptiveConcurrency` build a `Shaper` / concurrency guard, not a `Limiter` — see [Backpressure and shaping](#backpressure-and-shaping).\n\n---\n\n## Frameworks and the edge\n\n### Express\n\n```ts\nimport { expressRateLimit } from \"throttlekit/express\";\nimport { gcra } from \"throttlekit\";\n\napp.use(\n  expressRateLimit({\n    strategy: gcra({ limit: 100, periodMs: 60_000, burst: 20 }),\n    // Default key is a proxy-correct, IPv6-aggregated client IP (see Headers, IPs, and PII).\n    key: (req) =\u003e req.headers[\"x-api-key\"]?.toString() ?? req.ip ?? \"anon\",\n    cost: (req) =\u003e (req.method === \"POST\" ? 5 : 1),\n    fail: \"open\",            // allow if the store is unreachable (\"open\" | \"closed\")\n    emit: { draft: true },   // emit IETF draft RateLimit headers (the default)\n    onLimited: (req, _res, d) =\u003e console.warn(\"blocked\", req.path, d.retryAfterMs),\n  }),\n);\n```\n\nOn a denial the middleware responds `429` with `Retry-After`. Pass `handler` to own the `429`, or `limiter` instead of `strategy` to share a prebuilt limiter. See [`examples/express.ts`](./examples/express.ts).\n\n### Web / edge (`fetch`)\n\nRuns on Cloudflare Workers, Deno, Bun, and Next.js edge. The default key tries `cf-connecting-ip`, then `x-forwarded-for` (via the trusted-proxy policy), then `\"anon\"`.\n\n```ts\nimport { withRateLimit } from \"throttlekit/fetch\";\nimport { gcra } from \"throttlekit\";\n\nexport default {\n  fetch: withRateLimit(\n    (req: Request) =\u003e new Response(`hello ${new URL(req.url).pathname}`),\n    { strategy: gcra({ limit: 30, periodMs: 10_000 }), emit: { draft: true } },\n  ),\n};\n```\n\nOn allow it forwards to your handler and copies the rate-limit headers onto the response; on deny it returns `429`. See [`examples/fetch-edge.ts`](./examples/fetch-edge.ts).\n\n### Hono, Next.js, Fastify, Koa\n\nEvery adapter shares the same options (`strategy`/`limiter`, `store`, `key`, `cost`, `fail`, `emit`, `onLimited`, `handler`, trusted-proxy config) and the same standards headers — only the binding differs, and each is its own subpath.\n\n```ts\nimport { honoRateLimit } from \"throttlekit/hono\";       // edge-first\napp.use(\"*\", honoRateLimit({ strategy: gcra({ limit: 30, periodMs: 10_000 }) }));\n\nimport { fastifyRateLimit } from \"throttlekit/fastify\";  // Fastify v5\nfastify.addHook(\"onRequest\", fastifyRateLimit({ strategy: gcra({ limit: 100, periodMs: 60_000 }) }));\n\nimport { koaRateLimit } from \"throttlekit/koa\";          // Koa v3\napp.use(koaRateLimit({ strategy: gcra({ limit: 100, periodMs: 60_000 }) }));\n```\n\n**Next.js** middleware is dependency-free (`NextRequest`/`NextResponse` are Web `Request`/`Response`) — call the limiter, then branch:\n\n```ts\n// middleware.ts\nimport { NextResponse, type NextRequest } from \"next/server\";\nimport { nextRateLimit } from \"throttlekit/next\";\nimport { gcra } from \"throttlekit\";\n\nconst limit = nextRateLimit({ strategy: gcra({ limit: 30, periodMs: 10_000 }) });\n\nexport async function middleware(req: NextRequest) {\n  const r = await limit(req);\n  if (r.limited) return r.response;            // 429 (or 503 on a fail-closed outage)\n  const res = NextResponse.next();\n  for (const [k, v] of Object.entries(r.headers)) res.headers.set(k, v);\n  return res;\n}\n```\n\nFor Next.js **route handlers**, use `throttlekit/fetch` directly — they're Web `fetch` handlers.\n\n---\n\n## Going distributed\n\nThe same strategy you ran in-memory runs across a fleet — just hand it a distributed store. Every backend produces the same decisions (the conformance suite proves it).\n\n### Redis — atomic Lua, one round trip\n\n```ts\nimport { rateLimit, gcra } from \"throttlekit\";\nimport { RedisStore } from \"throttlekit/redis\";\nimport Redis from \"ioredis\";\n\nconst store = new RedisStore({ client: new Redis(process.env.REDIS_URL) });\n\nconst limiter = rateLimit({\n  strategy: gcra({ limit: 1000, periodMs: 60_000, burst: 100 }),\n  store,         // one EVALSHA per check, fully atomic — no read-then-write race\n  prefix: \"api\", // namespace, so one store can back many limiters\n});\n\nconst d = await limiter.check(userId);\n```\n\nBuilt-in strategies run their atomic Lua form in a single `EVALSHA` (with an `EVAL` fallback on `NOSCRIPT`); custom strategies fall back to optimistic concurrency (`WATCH`/`MULTI`/`EXEC`). `RedisStore` derives `now` from the Redis server clock, so node clock skew never corrupts shared state.\n\n**Any Redis client — including serverless / edge.** `RedisStore` speaks the `ioredis` shape directly; for the official **node-redis** client or the **Upstash REST** client (where TCP isn't allowed), wrap it:\n\n```ts\nimport { RedisStore, fromNodeRedis, fromUpstash } from \"throttlekit/redis\";\n\nnew RedisStore({ client: new Redis(url) });                  // ioredis — straight through\nnew RedisStore({ client: fromNodeRedis(nodeRedisClient) });  // official `redis` client\nnew RedisStore({ client: fromUpstash(Upstash.fromEnv()) });  // Upstash REST — serverless/edge\n```\n\nAll three are proven bit-identical to the in-process path by the conformance suite (ioredis and node-redis tested against a live server). Upstash REST has no interactive `WATCH`/`MULTI`, so it supports the Lua-backed built-ins only. See [`examples/redis-distributed.ts`](./examples/redis-distributed.ts).\n\n### PostgreSQL — no Redis required\n\nAlready running Postgres? You don't need to add Redis. `PostgresStore` is a fully distributed backend:\n\n```ts\nimport { rateLimit, gcra } from \"throttlekit\";\nimport { PostgresStore } from \"throttlekit/postgres\";\nimport { Pool } from \"pg\";\n\nconst store = new PostgresStore({ pool: new Pool({ connectionString: process.env.DATABASE_URL }), prefix: \"api\" });\nconst limiter = rateLimit({ strategy: gcra({ limit: 1000, periodMs: 60_000, burst: 100 }), store });\nconst d = await limiter.check(userId);\n```\n\nIt runs the **same pure JS transform** the in-memory store runs — no Postgres-specific algorithm to keep in sync — inside a transaction serialized per key by a transaction-scoped **advisory lock** (`pg_advisory_xact_lock`, which serializes first-touch keys that `SELECT … FOR UPDATE` cannot). So concurrent checks are atomic (N simultaneous at limit K admit exactly K, proven on a live server) and decisions are bit-identical to the other backends. Pass a `pg.Pool` directly — no adapter. Each check is one transaction; for hot keys, use it as the L2 of `twoTier({ mode: \"leased\" })`. See [`examples/postgres.ts`](./examples/postgres.ts).\n\n### Two-tier — local cache in front of the network\n\nFront the distributed store (L2) with a local in-process tier (L1) and pick the consistency/throughput trade-off:\n\n```ts\nimport { twoTier, gcra } from \"throttlekit\";\n\nconst limiter = twoTier({\n  strategy: gcra({ limit: 10_000, periodMs: 60_000, burst: 500 }),\n  l2: store,                                   // a distributed store, e.g. RedisStore\n  mode: \"leased\",                              // \"strict\" | \"cached-deny\" | \"leased\"\n  lease: { batch: 50, windowCoupled: true },   // lease 50 at a time; expire at the L2 window\n});\n```\n\n| Mode | Network cost | Global accuracy | Best for |\n|---|---|---|---|\n| `strict` | 1 round trip / request | Exact | Hard quotas, billing |\n| `cached-deny` | 1 round trip / *allowed* request | Exact for allows, local for denies | Public APIs under abuse |\n| `leased` | ~1 round trip / `batch` requests | Bounded overshoot (below) | High-throughput internal APIs |\n\n`leased` trades exactness for throughput, with a **provably bounded** worst-case overshoot you choose:\n\n- **Default (carryover):** `admitted ≤ Limit + N·(Batch−1)` — tight, but grows with the fleet size `N`.\n- **`windowCoupled: true`:** credits expire at the L2 window boundary, so `admitted ≤ Limit` — **independent of `N`**. Opt-in; default off preserves legacy behaviour. `twoTier`'s `check` is async (`checkSync` throws — L2 is asynchronous).\n\n\u003e **Formally verified — and independent of fleet size.** These bounds are proven, not claimed. A [TLA⁺ spec](./spec/DistributedLeasing.tla) is **model-checked with TLC** — carryover overshoot is *exactly* `Limit + N·(Batch−1)` (a counterexample shows it's tight, not loose) — and window-coupling tightens it to **exactly `Limit`, independent of N** ([second spec](./spec/GaleWindowCoupledLeasing.tla) + a Java-free [exhaustive checker](./test/gale/leasing-variants.test.ts) reproducing both in CI). This is the shipped core of **GALE**, a research program on provable distributed leasing — adding learned (online-EOQ) lease sizing, weighted work-conserving fairness, and a proved overshoot/coordination/utilization **trilemma** lower bound, each machine-checked or measured under [`research/gale/`](./research/gale). Details: [`docs/FORMAL-MODEL.md`](./docs/FORMAL-MODEL.md).\n\n### Multi-region\n\nA global limit across regions is the leased model with the **regions as the leasing nodes** and one shared L2. Each region serves the bulk of its traffic from a local lease — region-local latency, no per-request cross-region hop — and the *same* verified bound caps the **worldwide** overshoot:\n\n```text\nglobal admitted per window  ≤  Limit + regions × (batch − 1)   (carryover)\n                            ≤  Limit                            (windowCoupled — any number of regions)\n```\n\nSo 4 regions leasing `batch: 50` against a global `limit: 10_000` admit at most `10_196` worldwide under carryover (\u003c 2% overshoot for ~1 cross-region hop per 50 requests) — or **exactly `10_000` with `windowCoupled`, no matter how many regions**. There's no separate multi-region engine to trust — it's `twoTier` leased at a shared store. See [`examples/multi-region.ts`](./examples/multi-region.ts).\n\n---\n\n## Limiting on several axes\n\nLimit on per-IP **and** per-user **and** per-route at once. `all({...})` allows only if **every** dimension allows, and consumes nothing unless all allow (no partial-consume). `any({...})` allows if any permits. Pass the result to **`multiRateLimit`** (not `rateLimit`):\n\n```ts\nimport { all, gcra, fixedWindow, multiRateLimit } from \"throttlekit\";\n\ninterface Ctx { ip: string; userId: string; route: string }\n\nconst limiter = multiRateLimit\u003cCtx\u003e({\n  store, // on Redis, all dimensions fuse into one atomic Lua round trip\n  strategy: all\u003cCtx\u003e({\n    ip:    { key: (c) =\u003e c.ip,     strategy: gcra({ limit: 100, periodMs: 60_000 }) },\n    user:  { key: (c) =\u003e c.userId, strategy: gcra({ limit: 1000, periodMs: 60_000 }) },\n    route: { key: (c) =\u003e c.route,  strategy: fixedWindow({ limit: 50, windowMs: 1_000 }) },\n  }),\n});\n\nconst d = await limiter.check({ ip, userId, route: \"/search\" });\n```\n\nThe returned `Decision` reflects the binding constraint (the denying dimension, or the smallest `remaining` when allowed). Dimensions support per-dimension `cost`. On Redis, multi-dimensional checks support `gcra`, `tokenBucket`, and `fixedWindow`. See [`examples/multi-dimensional.ts`](./examples/multi-dimensional.ts).\n\n---\n\n## Backpressure and shaping\n\n### Adaptive concurrency\n\nNot a rate — a dynamically inferred ceiling on *in-flight* requests, derived from the latency gradient (`RTT_noload / RTT_actual`) and adjusted with a congestion-control sawtooth.\n\n```ts\nimport { adaptiveConcurrency } from \"throttlekit\";\n\nconst guard = adaptiveConcurrency({ minLimit: 4, maxLimit: 512, algorithm: \"gradient2\" });\n\nconst lease = guard.acquire();\nif (!lease.ok) return; // over the inferred ceiling — shed load (e.g. 503)\ntry {\n  await handle(request);\n} finally {\n  lease.release(); // latency measured automatically; pass { dropped: true } on a failure/timeout\n}\n```\n\nIntrospect with `guard.limit`, `guard.inflight`, `guard.stats()`. Algorithms: `\"gradient2\"` (default) or `\"aimd\"`. See [`examples/adaptive-concurrency.ts`](./examples/adaptive-concurrency.ts).\n\n### Leaky-bucket scheduling\n\n`leakyBucket` builds a `Shaper` that *delays* rather than rejects, smoothing bursty input to a steady output rate — handy for pacing outbound calls to a third-party budget.\n\n```ts\nimport { leakyBucket, QueueFullError } from \"throttlekit\";\n\nconst shaper = leakyBucket({ ratePerSec: 5, maxQueueMs: 2_000 });\ntry {\n  await shaper.schedule(\"upstream-api\"); // resolves after the paced delay\n  await callUpstream();\n} catch (err) {\n  if (err instanceof QueueFullError) console.warn(\"queue full, retry in\", err.retryAfterMs, \"ms\");\n}\n```\n\n`reserve(key, cost?)` returns a `Reservation` (`{ accepted, delayMs }`) without sleeping; `schedule` waits or throws `QueueFullError`. `reserveSync` is available on a sync store. See [`examples/leaky-bucket.ts`](./examples/leaky-bucket.ts).\n\n---\n\n## Surviving a flood\n\nPer-key stores keep one record per active key — which, under a flood of *millions of distinct* keys (every source IP in a volumetric attack), makes that state itself the memory-exhaustion vector. `sketchRateLimit` limits an **unbounded key universe in fixed memory** with a Count-Min Sketch: ~**7.4 KB total**, regardless of how many keys it sees.\n\n```ts\nimport { sketchRateLimit } from \"throttlekit\";\n\nconst shield = sketchRateLimit({ limit: 100, windowMs: 60_000 }); // ε=0.01, δ=0.001 by default\nif (!shield.checkSync(clientIp).allowed) return reject(429);       // sync or async (check)\n```\n\nBecause the sketch never *under*counts, `allowed` implies the true admitted count is ≤ `limit` — it never over-admits (a hard, non-probabilistic property). Its only error is the safe direction: it may deny a key slightly early once collisions inflate its estimate, bounded by `ε·N` w.p. `≥ 1−δ` ([Cormode \u0026 Muthukrishnan 2005](http://dimacs.rutgers.edu/~graham/pubs/papers/cmencyc.pdf); conservative-update from Estan \u0026 Varghese).\n\n**Cluster-wide (`mergeableSketch`).** A low-and-slow distributed attacker can stay under every node's threshold while flooding the fleet. Because CMS counters are linear, each node keeps its own fixed-memory sketch, ships it as compact bytes (`snapshot()` / `toBytes()`), and `merge()`s peers' — the sum is *exactly* the sketch of the whole cluster's traffic, so the global heavy hitter becomes visible everywhere. Honestly scoped as eventually-consistent **detection**, not a strongly-consistent global limit. See [`examples/distributed-sketch.ts`](./examples/distributed-sketch.ts).\n\n---\n\n## Overload and fairness\n\nTwo admission-control primitives that sit *upstream* of the per-key limiters.\n\n**`adaptiveThrottle`** — Google-SRE [client-side adaptive throttling](https://sre.google/sre-book/handling-overload/). A client that keeps hammering an overloaded backend only deepens the overload; this sheds a growing fraction *locally* based on the backend's recent accept rate:\n\n```ts\nimport { adaptiveThrottle } from \"throttlekit\";\n\nconst throttle = adaptiveThrottle({ k: 2 }); // shed once sending \u003e 2× what the backend accepts\nif (!throttle.request()) return failFast();   // shed locally, don't even try\ntry { const res = await callBackend(); throttle.record(res.ok); }\ncatch { throttle.record(false); }            // feed outcomes back so it self-corrects\n```\n\n`p = max(0, (requests − K·accepts) / (requests + 1))` over a rolling window; `request(priority)` protects critical traffic.\n\n**`fairShare`** — split one global per-window budget across tenants so a greedy tenant can't starve the others (each active tenant guaranteed ≥ `limit/N`, global total ≤ `limit`):\n\n```ts\nimport { fairShare } from \"throttlekit\";\nconst fair = fairShare({ limit: 1000, windowMs: 60_000 });\nconst d = fair.checkSync(tenantId); // d.limit is this tenant's current fair cap\n```\n\n**`weightedFairShare` / `weightedMaxMin`** — weighted fairness (*Weighted Fair Escrow*). Give tenants priority weights and the budget splits in proportion; `weightedMaxMin` is the exact, **work-conserving** weighted max-min split (an idle tenant's share flows to the backlogged ones; every backlogged tenant gets at least its weighted floor):\n\n```ts\nimport { weightedFairShare, weightedMaxMin } from \"throttlekit\";\nconst fair = weightedFairShare({ limit: 1000, windowMs: 60_000, weightOf: (t) =\u003e t === \"pro\" ? 4 : 1 });\nweightedMaxMin([100, 100, 100, 100], [4, 1, 1, 1], 100); // → [57, 15, 14, 14]: weighted, sums to the budget\n```\n\n`weightedFairShare` is the online streaming limiter (weighted caps, honest online caveats); `weightedMaxMin` is the proven batch allocator (use it to divide, e.g., a `twoTier` node's leased batch among local tenants). Both are the shipped piece of *Weighted Fair Escrow*; its four fairness properties are machine-checked in the [research track](./research/gale/PILLAR4-fairness.md). Fully-distributed weighted *leasing* (weighted L2 grants across nodes) remains research.\n\n---\n\n## Deterministic time\n\nTime is injected everywhere — no `Date.now()` hides inside an algorithm — so every limit is reproducible to the millisecond.\n\n```ts\nimport { rateLimit, gcra, ManualClock, MemoryStore } from \"throttlekit\";\n\nconst clock = new ManualClock(0);\nconst limiter = rateLimit({\n  strategy: gcra({ limit: 2, periodMs: 1_000 }), // burst defaults to 2\n  clock,\n  store: new MemoryStore({ clock }),\n});\n\n(await limiter.check(\"k\")).allowed; // true\n(await limiter.check(\"k\")).allowed; // true\n(await limiter.check(\"k\")).allowed; // false — burst exhausted\nclock.advance(500);                 // one emission interval (1000/2) later\n(await limiter.check(\"k\")).allowed; // true\n```\n\n`ManualClock` exposes `.advance(ms)`, `.set(ms)`, `.now()`. See [`examples/basic-memory.ts`](./examples/basic-memory.ts).\n\n---\n\n## Headers, IPs, and PII\n\n**Standards-compliant headers.** `buildRateLimitHeaders(decision, opts)` produces a plain `Record\u003cstring, string\u003e` (the adapters call it for you), in three families via `emit`: **`draft`** *(default)* — the IETF `RateLimit-Limit`/`-Remaining`/`-Reset` triple; **`structured`** — RFC 9651 `RateLimit` + `RateLimit-Policy`; **`legacy`** — the `X-RateLimit-*` triple. On a denial a `Retry-After` (delta-seconds, min 1) is always added, and all time math derives from the injected `now`.\n\n**Trusted proxy \u0026 IPv6 aggregation.** Trusting `X-Forwarded-For` blindly is the classic bypass. `clientIp` refuses to: the default is `trustProxy: false` (use the socket peer), trust is opt-in as a hop count or CIDR allowlist, and it aggregates IPv6 to a configurable prefix (`/64` default) so one customer can't rotate through billions of addresses.\n\n```ts\nimport { clientIp } from \"throttlekit\";\n\nconst key = clientIp(\n  { remoteAddr: req.socket.remoteAddress ?? \"\", xForwardedFor: req.headers[\"x-forwarded-for\"] },\n  { trustProxy: [\"10.0.0.0/8\"], ipv6Prefix: 64 }, // or trustProxy: 1 for a single hop\n);\n```\n\nThe Express and `fetch` adapters accept `trustProxy`/`ipv6Prefix` directly and derive this key by default.\n\n**PII-safe keys (HMAC).** Hash raw identifiers with a server secret before they reach the store, so a shared Redis never holds the raw value:\n\n```ts\nimport { hmacKeyer } from \"throttlekit\";\nconst keyer = hmacKeyer(process.env.RL_SECRET ?? \"\");\nawait limiter.check(keyer(rawUserId));\n```\n\n---\n\n## Observability\n\nEvery `Decision` is a plain, loggable object. For metrics, the optional OpenTelemetry layer (`throttlekit/otel`) wraps a limiter or guard with your own `Meter`:\n\n```ts\nimport { instrumentLimiter, instrumentGuard } from \"throttlekit/otel\";\nimport { metrics } from \"@opentelemetry/api\";\n\nconst meter = metrics.getMeter(\"my-service\");\nconst observed = instrumentLimiter(limiter, meter); // throttlekit.checks / .remaining / .store.latency\ninstrumentGuard(guard, meter);                       // concurrency.limit / .inflight / .rtt_noload\n```\n\nFor zero-config insight without a metrics backend, wrap a limiter with **`withAnalytics`** — it tracks allow/deny counts and the **top-K heavy hitters** (keys driving the most traffic and denials) in bounded memory via **Space-Saving** (Metwally et al. 2005), so your worst offenders surface even under a flood of unique keys:\n\n```ts\nimport { withAnalytics, rateLimit, gcra } from \"throttlekit\";\n\nconst limiter = withAnalytics(rateLimit({ strategy: gcra({ limit: 100, periodMs: 60_000 }) }));\nawait limiter.check(clientIp); // use exactly like any limiter\nconst a = limiter.analytics(); // { allowed, denied, total, denyRate, topRequested: [...], topDenied: [...] }\n```\n\n---\n\n## When the backend goes down\n\nThe in-process `MemoryStore` never fails. A distributed store can: if Redis is unreachable, `check()` rejects (`StoreUnavailableError`). **You decide what that means** — every adapter takes a `fail` policy and fires `onError` before applying it:\n\n| `fail` | On a store outage | Use when |\n|---|---|---|\n| `\"open\"` *(default)* | Allow the request | Availability \u003e the cap — most public APIs |\n| `\"closed\"` | Reject with `503` | The cap is a hard guarantee — billing, abuse-critical paths |\n\n```ts\nexpressRateLimit({\n  strategy: gcra({ limit: 100, periodMs: 60_000 }),\n  store: redisStore,\n  fail: \"closed\",\n  onError: (_req, _res, err) =\u003e log.warn({ err }, \"rate limiter store down\"),\n});\n```\n\nTwo extra hedges: **`twoTier` leased** keeps serving from the local lease while L2 is briefly unreachable, and the Redis path is a single atomic round trip (no read-then-write window to interrupt). Both fail modes are tested on every adapter.\n\n---\n\n## Performance\n\nIn-process, single hot key (Node 24, reproducible via `npm run bench`; numbers vary ~±10%):\n\n- **`checkSync` (GCRA): ~3.1M ops/s, ~320 ns/op, allocation-free.**\n- `check` (async, GCRA): **~1.7M ops/s** (~600 ns/op).\n- Redis: exactly **one** `EVALSHA` round trip per check.\n\n**Head-to-head, the honest version** (`npm run bench:compare`, same machine/process/warmup):\n\n- **Sync:** one of the few JS limiters with a synchronous API at all, and it's allocation-free.\n- **Redis:** roughly **tied** with `rate-limiter-flexible` (both one atomic Lua round trip), with a tighter tail.\n- **Async in-memory:** the counter-based libraries are **faster** (~2–5M vs ~1.3–1.7M ops/s) — the cost of GCRA over a bounded-memory store vs a plain counter; all far past per-process need.\n- **Postgres:** a single bare check **trails** `rate-limiter-flexible`'s one-statement upsert (~3×, by design — one generic transaction per strategy); under load, `twoTier(leased)` amortizes it into a large throughput win.\n\nThe full table — algorithms labelled, methodology, and every place ThrottleKit loses — is in [SCOREBOARD.md](./SCOREBOARD.md).\n\n---\n\n## Migrating\n\n**From `express-rate-limit`:**\n\n```ts\n// before\napp.use(rateLimit({ windowMs: 60_000, limit: 100 }));\n// after — GCRA by default (smooth pacing, no 2× boundary burst), same standards headers\nimport { expressRateLimit } from \"throttlekit/express\";\nimport { gcra } from \"throttlekit\";\napp.use(expressRateLimit({ strategy: gcra({ limit: 100, periodMs: 60_000 }) }));\n// want the classic window? swap in fixedWindow({ limit: 100, windowMs: 60_000 })\n```\n\n**From `rate-limiter-flexible`:**\n\n```ts\n// before — throws on exhaustion\ntry { await rl.consume(key); } catch { /* respond 429 */ }\n// after — one atomic Lua round trip, a Decision object instead of throw-on-deny\nimport { rateLimit, gcra } from \"throttlekit\";\nimport { RedisStore } from \"throttlekit/redis\";\nconst limiter = rateLimit({ strategy: gcra({ limit: 100, periodMs: 60_000 }), store: new RedisStore({ client: redis }) });\nconst d = await limiter.check(key);\nif (!d.allowed) { /* respond 429 with d.retryAfterMs */ }\n```\n\n---\n\n## Recipes\n\n```ts\n// Tiered plans (free / pro) by API key — one store, namespaced per tier\nconst limiters = {\n  free: rateLimit({ strategy: gcra({ limit: 60, periodMs: 60_000 }), store, prefix: \"free\" }),\n  pro:  rateLimit({ strategy: gcra({ limit: 1_000, periodMs: 60_000 }), store, prefix: \"pro\" }),\n};\nconst d = await limiters[planFor(req)].check(apiKeyOf(req));\n\n// Cost-weighted endpoints — charge expensive routes more from the same budget\nawait limiter.check(apiKeyOf(req), routeIsExpensive(req) ? 5 : 1);\n```\n\n**Per-IP *and* per-route in one round trip** — see [Limiting on several axes](#limiting-on-several-axes). **Tiered burst + sustained** — compose two GCRA limiters (e.g. 10/sec *and* 1000/hour) and allow only if both pass.\n\n---\n\n## How it's tested\n\nThrottleKit is built to be checkable, not just claimed:\n\n- **Dual-path conformance** — thousands of generated `(arrivals, costs, clock)` timelines run through both the JS and Lua path of each strategy; the two must produce identical decision streams.\n- **Property tests** (fast-check) — invariants like \"`remaining` never negative\" and \"leased overshoot ≤ documented bound\" under randomized inputs.\n- **Atomicity** — fire N simultaneous checks at limit K and assert exactly K allowed, for MemoryStore and (env-gated) real Redis and Postgres.\n- **Formal model** — the leasing protocol is model-checked with TLA⁺/TLC and re-checked by an exhaustive JS checker in CI ([`docs/FORMAL-MODEL.md`](./docs/FORMAL-MODEL.md)).\n- **Research track (GALE)** — that bound is the foundation of an ongoing program: overshoot *independent of fleet size*, learned (online-EOQ) lease sizing, weighted work-conserving fairness, and a proved *trilemma* lower bound — each machine-checked or measured under [`research/gale/`](./research/gale). (Research modules; not public API beyond `lease.windowCoupled`.)\n- **Store conformance kit** — `runStoreConformance` from `throttlekit/testkit` runs any custom store through the same atomicity / TTL / concurrency suite the built-ins pass.\n\nAll time-dependent tests use `ManualClock`, so the suite is deterministic. Current state: **389 tests, 95.2% line coverage**, CI green across Node 20/22/24 — tracked in [SCOREBOARD.md](./SCOREBOARD.md).\n\n---\n\n## Design and docs\n\n- [THROTTLEKIT.md](./THROTTLEKIT.md) — full design and architecture.\n- [SCOREBOARD.md](./SCOREBOARD.md) — benchmarks, correctness guarantees, feature matrix.\n- [docs/FORMAL-MODEL.md](./docs/FORMAL-MODEL.md) — the formally-verified leasing bound.\n- [research/gale/](./research/gale) — the GALE research track: provable distributed leasing (overshoot independent of fleet size, learned lease sizing, weighted fairness, the trilemma lower bound).\n- [CHANGELOG.md](./CHANGELOG.md) — release history.\n- [`examples/`](./examples) — a runnable file for every section above.\n\n---\n\n## Status\n\nThrottleKit is `0.x`: feature-complete and heavily tested, but young — the public API may still be refined before a `1.0` that commits to SemVer stability. MIT-licensed and developed in the open.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fameyaborkar%2Fthrottlekit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fameyaborkar%2Fthrottlekit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fameyaborkar%2Fthrottlekit/lists"}