https://github.com/platformatic/coordinator
https://github.com/platformatic/coordinator
Last synced: 12 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/platformatic/coordinator
- Owner: platformatic
- License: apache-2.0
- Created: 2026-05-02T05:45:46.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-06-01T05:30:18.000Z (12 days ago)
- Last Synced: 2026-06-01T07:18:41.232Z (12 days ago)
- Language: TypeScript
- Size: 259 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# @platformatic/coordinator
Multi-pod destination routing for stateful tiers. Valkey-backed registry, pod-side `Member` class, allocation strategies, lock routing, TTL cache, and optional Fastify helpers.
See [`coordinator-pattern.md`](./coordinator-pattern.md) for the architecture this library implements: caller / coordinator / resource pod, failover, fan-out, transactions and locks.
## What it solves
You have N pods that hold stateful resources (PostgreSQL connection pools, agent processes, sandboxes, simulations). Requests carry a routing key -- a "destination" or "instance" id. You need every request for that destination to land on a pod that owns it. Pods die; surviving pods should take over. Under sustained load, a single destination may need to live on more than one pod.
This library handles all of that:
- Destination → pod set, persisted in Valkey
- Pod self-registration + heartbeat, with `total_connections` published as a metric
- Atomic first-touch claim, atomic failover (SREM dead + SADD fresh)
- Pluggable allocation strategies (round-robin, least-loaded by total connections, random)
- Lock routing for transaction-bound calls (lockId → pod, resolved via Valkey)
- A short-lived local cache for the hot resolution path
- Fastify helpers for HTTP coordinators
## Install
```sh
npm install @platformatic/coordinator
```
Peer dependency: `fastify >= 5` when using the Fastify plugin or helpers. `@fastify/reply-from` is a runtime dependency of this package — you don't need to install it separately.
## Quick start (Fastify plugin)
```ts
import Fastify from 'fastify'
import coordinatorPlugin from '@platformatic/coordinator'
const app = Fastify()
await app.register(coordinatorPlugin, {
redis: 'redis://valkey:6379',
keyPrefix: 'myservice',
strategy: 'least-loaded'
})
app.get('/destinations/:id/work', app.coordinator.lookupAndProxy({
destinationFrom: req => req.params.id,
claimOnMiss: true,
reassignOrphans: true
}))
app.post('/destinations', app.coordinator.pickAndRegister({
registerIdFrom: res => res.id
}))
app.delete('/destinations/:id', app.coordinator.lookupAndDeregister({
destinationFrom: req => req.params.id
}))
app.post('/transactions/:lockId/work', app.coordinator.lookupLockAndProxy({
lockFrom: req => req.params.lockId
}))
await app.listen({ port: 3000 })
```
The plugin:
- Registers `@fastify/reply-from` (idempotently — skipped if already registered)
- Constructs a `Registry` from the passed options (or reuses one you provide via `registry`)
- Exposes both the registry and the route-handler-factory helpers on `app.coordinator`
- Closes the registry on `app.close()` (unless you brought your own)
Options:
```ts
interface CoordinatorPluginOptions {
// Forwarded to new Registry(...) when no `registry` is supplied:
redis?: string // redis/valkey URL
keyPrefix?: string // default 'coordinator'
strategy?: 'round-robin' | 'least-loaded' | 'random' | AllocationStrategy
cache?: { ttl?: number, max?: number } | false
requestTimeout?: number
registry?: Registry // reuse an existing Registry (plugin will not close it)
decorateAs?: string // default 'coordinator'
replyFrom?: FastifyReplyFromOptions // forwarded to @fastify/reply-from
registerReplyFrom?: boolean // default true; set false if you already registered reply-from
}
```
The legacy standalone helper imports (`import { lookupAndProxy } from '@platformatic/coordinator'`) still work and are documented below. They require manually registering `@fastify/reply-from` and constructing the `Registry`.
## Valkey layout
With `keyPrefix: 'myservice'`:
| Key | Type | Owner | Purpose |
|---|---|---|---|
| `myservice:members` | set | pod | the set of memberIds known to be live |
| `myservice:member:` | hash with `address`, `load` | pod | live pod registration and load metric, TTL refreshed by heartbeat |
| `myservice:destination:` | set of memberIds | coordinator + pod | pods currently serving this destination |
| `myservice:lock:` | hash with `podId`, `destinationId`, metadata | pod | lockId routing for transaction-bound calls |
## Pod side: `Member`
The pod-side class owns its own iovalkey connection and writes the keys the pod is responsible for.
```ts
import { Member } from '@platformatic/coordinator'
const member = new Member({
redis: 'redis://valkey:6379',
memberId: 'pod-1',
address: 'http://pod-1.local:3000',
keyPrefix: 'myservice',
ttl: 30, // seconds; default 30
getLoad: () => pool.openCount() // optional; default () => 0
})
await member.register() // SADD + HSET + EXPIRE
const heartbeat = setInterval(() => member.heartbeat(), 10_000) // HSET + EXPIRE
heartbeat.unref()
// When this pod fans itself in to a destination:
await member.addToDestination(destId)
await member.removeFromDestination(destId)
// When this pod mints / releases a transaction lock:
await member.registerLock(lockId, destId, { isolationLevel: 'serializable' })
await member.unregisterLock(lockId)
// Peer query for fan-out picks (returns live pods with their load):
const peers = await member.listPeerLoad()
// Graceful shutdown:
clearInterval(heartbeat)
await member.deregister()
await member.close()
```
## Coordinator side: `Registry`
```ts
import { Registry } from '@platformatic/coordinator'
const registry = new Registry({
redis: 'redis://valkey:6379',
keyPrefix: 'myservice',
strategy: 'least-loaded',
cache: { ttl: 5000, max: 10_000 } // default; pass `false` to disable
})
// Hot path: resolve a destination, pick one pod, return its address.
const resolved = await registry.resolveDestination(destId, {
claimOnMiss: true, // SADD a fresh pod if the destination's set is empty
reassignOrphans: true // SREM dead + SADD fresh if every pod in the set is dead
})
if (resolved) {
// { address, memberId, reassigned }
}
// Lock-bound call: route by lockId, not destination.
const lockRouting = await registry.resolveLock(lockId)
if (lockRouting) {
// { address, memberId }
}
// Other primitives:
await registry.listLiveMembers() // [{ memberId, address, load }, ...]
await registry.pickMember({ destinationId: destId })
await registry.addPodToDestination(destId, memberId)
await registry.hasBinding(destId)
await registry.deregisterDestination(destId)
await registry.close()
```
## Resolution and failover
`resolveDestination` reads the destination's pod set, filters by liveness, and applies the allocation strategy. The four cases:
| Set state | `claimOnMiss` | `reassignOrphans` | Result |
|---|---|---|---|
| Empty | false | -- | `null` (404 territory) |
| Empty | true | -- | Pick a live pod, `SADD` it, return |
| Has live pods (possibly with dead too) | -- | -- | Pick one of the live pods; dead ones cleaned up in background |
| All dead, non-empty | -- | false | `null` |
| All dead, non-empty | -- | true | Pick fresh, `SREM` dead + `SADD` fresh, return with `reassigned: true` |
All writes use `SADD` / `SREM` (atomic). Concurrent first-touch by two coordinators can produce a destination with two pods from the start. That's a valid steady state, not a corrupted one.
## Allocation strategies
Pluggable. Built-in: `round-robin` (default), `least-loaded`, `random`. Custom strategies implement:
```ts
interface AllocationStrategy {
pick (candidates: MemberInfo[], ctx: { destinationId?: string }): MemberInfo | null
}
```
`candidates` is the pool to choose from -- the full live set on first-touch / failover, or the live members of a destination's pod set on the hot path for fanned-out destinations. `ctx.destinationId` is the destination, so custom strategies can branch on it (for example, pin "dedicated" tenants to a designated subset of pods and round-robin "shared" tenants across the rest).
Built-in least-loaded reads `load` from each candidate's member record (`HGET` pipeline). It runs at first touch for single-pod destinations and on every request for fanned-out destinations.
## TTL cache
`resolveDestination` checks a local LRU+TTL cache before reading Valkey. Default 5 s TTL, 10 000 entries. Configure with `cache: { ttl, max }` or disable with `cache: false`. Writes through the registry (`addPodToDestination`, `deregisterDestination`) evict the affected key. Each replica has its own cache.
## Fastify helpers (standalone, advanced)
The helpers used internally by `app.coordinator.*` are also exported as standalone functions, for users who want to manage their own `Registry` and reply-from registration. Each emits a tagged result via an optional `onResult` callback so presets can hook their own metric counters.
Before mounting any helper-backed route you must register `@fastify/reply-from` (the `coordinatorPlugin` does this for you):
```ts
import Fastify from 'fastify'
import replyFrom from '@fastify/reply-from'
import { Registry, lookupAndProxy } from '@platformatic/coordinator'
const app = Fastify()
await app.register(replyFrom)
const registry = new Registry({ redis, keyPrefix })
app.get('/x/:id', lookupAndProxy(registry, { destinationFrom: r => r.params.id }))
```
### `lookupAndProxy`
```ts
app.post('/destinations/:id/work', lookupAndProxy(registry, {
destinationFrom: req => req.params.id,
reassignOrphans: true,
onResult: result => metrics.inc({ type: 'work', result }) // 'hit' | 'orphan_reassigned' | 'not_found'
}))
```
Resolves the destination, proxies via `reply.from`, returns 404 if the destination has no live pod.
### `pickAndRegister`
```ts
app.post('/destinations', pickAndRegister(registry, {
registerIdFrom: res => res.id
}))
```
Picks a pod, proxies the create request, and `SADD`s the returned id to the destination set only on a 2xx upstream response. Returns 503 if there are no live pods.
### `lookupAndDeregister`
```ts
app.delete('/destinations/:id', lookupAndDeregister(registry, {
destinationFrom: req => req.params.id
}))
```
Resolves, proxies the delete; on `expectedStatus` (204 by default), `DEL`s the destination set. If the destination has only dead pods, skips the proxy and just deletes the set ("deregistered_dead_pod").
All four helpers go through `@fastify/reply-from`. The `coordinatorPlugin` registers it automatically; if you use the standalone helpers, you must register it yourself.
### `lookupLockAndProxy`
```ts
app.post('/transactions/:lockId/work', lookupLockAndProxy(registry, {
lockFrom: req => req.params.lockId
}))
```
Resolves the lockId to the pod that owns it (via `Registry.resolveLock`) and proxies through. 404s if the lockId is unknown.
## Testing
Unit tests need a Redis on `127.0.0.1:6390`. E2E tests also need a Postgres on `127.0.0.1:15432` (storage/storage/storage). Both are in the included `docker-compose.yml`.
```sh
pnpm run test:deps:up # brings up redis + postgres
pnpm test # unit tests
pnpm run test:e2e # end-to-end tests (uses the storage-db example)
pnpm run test:deps:down
```
URLs are read from `REDIS_URL` (default `redis://127.0.0.1:6390`) and `PG_URL` (default `postgresql://storage:storage@127.0.0.1:15432/storage`). Tests isolate keys with a random prefix and clean up after themselves.
## License
Apache-2.0