{"id":50695411,"url":"https://github.com/kckempf/yallmap","last_synced_at":"2026-06-09T06:06:19.588Z","repository":{"id":362566191,"uuid":"1251721558","full_name":"kckempf/yallmap","owner":"kckempf","description":"An OpenTelemetry-instrumented gateway for Anthropic-compatible LLMs","archived":false,"fork":false,"pushed_at":"2026-06-04T21:11:04.000Z","size":540,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-04T22:15:01.284Z","etag":null,"topics":["ai-gateway","anthropic","claude","claude-code","langfuse","llm-gateway","llm-observability","llm-proxy","ollama","opentelemetry","otel","typescript"],"latest_commit_sha":null,"homepage":"http://grokkist.com","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kckempf.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-27T21:16:33.000Z","updated_at":"2026-06-04T21:11:28.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/kckempf/yallmap","commit_stats":null,"previous_names":["kckempf/yallmap"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/kckempf/yallmap","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kckempf%2Fyallmap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kckempf%2Fyallmap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kckempf%2Fyallmap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kckempf%2Fyallmap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kckempf","download_url":"https://codeload.github.com/kckempf/yallmap/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kckempf%2Fyallmap/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34093798,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-gateway","anthropic","claude","claude-code","langfuse","llm-gateway","llm-observability","llm-proxy","ollama","opentelemetry","otel","typescript"],"created_at":"2026-06-09T06:06:17.556Z","updated_at":"2026-06-09T06:06:19.579Z","avatar_url":"https://github.com/kckempf.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# yallmap\n\n![CI](https://github.com/kckempf/yallmap/actions/workflows/ci.yml/badge.svg)\n![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)\n![Node](https://img.shields.io/badge/node-%3E%3D20-blue)\n\nAn OpenTelemetry-instrumented gateway for Anthropic-compatible LLMs. Drop it in front\nof [Claude Code](https://claude.ai/code) or any Anthropic SDK client to get per-request\ntoken tracking, cost attribution, and latency observability in\n[Langfuse](https://langfuse.com) — no client changes required.\n\n## Why this exists\n\nLiteLLM, Helicone, and Portkey are already out there, making this Yet Another LLM Proxy (YALLMAP). This is a different point in the design space: TypeScript-native, Anthropic API-first (not OpenAI-shaped), with routing rules expressed as typed functions instead of YAML or CEL. Built around Claude Code as a primary client, optimized for streaming and tool-use.  It's the LLM proxy that I need, so I've built it and shared it, as I can't be the only one who works the way I do.\n\n## Try it in 2 minutes (if you have Claude Code installed)\n\n```bash\ngit clone https://github.com/kckempf/yallmap \u0026\u0026 cd yallmap\nnpm install\nnpm run dev                          # starts on :3001\n\n# in another shell:\nANTHROPIC_BASE_URL=http://localhost:3001 claude\n```\n\nLangfuse is **optional** — the gateway works without it; you just lose the telemetry\nhalf. The telemetry exporter warns on startup if `OTEL_EXPORTER_OTLP_ENDPOINT` is unset,\nthen continues running normally.\n\n## Status\n\n**v0.8** — Production hardening. See roadmap below.\n\n## How it works\n\n```text\nClaude Code ──► yallmap :3001 ──► api.anthropic.com\n                     │          └───► Ollama (ollama/* models)\n                     │\n                     └──► Langfuse (via OTLP)\n                          gen_ai.system\n                          gen_ai.request.model\n                          gen_ai.usage.input_tokens\n                          gen_ai.usage.output_tokens\n                          gen_ai.response.finish_reasons\n```\n\nEvery request to `POST /v1/messages` is routed to the appropriate provider based on\nTypeScript routing rules. SSE streaming is piped through without buffering. A transform\nstream reads SSE events in-flight to extract token usage, emitted as a `gen_ai.request`\nspan when the response completes.\n\nOllama requests are automatically translated between the Anthropic Messages API format\nand Ollama's OpenAI-compatible API — the client always speaks Anthropic.\n\n## Prerequisites\n\n- Node.js 20+\n- A running [Langfuse](https://langfuse.com/docs/deployment/self-host) instance\n  (Docker Compose quickstart: `docker-compose up -d` from the Langfuse repo)\n- [Ollama](https://ollama.ai) (optional — only needed for `ollama/*` model routing)\n\n## Setup\n\n```bash\nnpm install\ncp .env.example .env\n```\n\nEdit `.env`:\n\n```env\n# Assuming Langfuse is running on port 3000\nPORT=3001\n\n# Langfuse OTLP endpoint\nOTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:3000/api/public/otel/v1/traces\n\n# Langfuse project keys (Settings → API Keys)\nLANGFUSE_PUBLIC_KEY=pk-lf-...\nLANGFUSE_SECRET_KEY=sk-lf-...\n\n# Optional overrides (defaults shown)\n# ANTHROPIC_BASE_URL=https://api.anthropic.com\n# OLLAMA_BASE_URL=http://localhost:11434\n```\n\n## Running\n\n```bash\n# Development (watch mode, loads .env)\nnpm run dev\n\n# Production\nnpm run build\nnpm start\n```\n\n## Pointing Claude Code at the gateway\n\n```bash\nANTHROPIC_BASE_URL=http://localhost:3001 claude\n```\n\nOr export it in your shell profile to make it permanent.\n\n## Authentication\n\nThe gateway supports a multi-key allowlist with per-key identity. Configure clients\nvia the `GATEWAY_API_KEYS` environment variable as comma-separated `label:secret`\npairs:\n\n```env\nGATEWAY_API_KEYS=alice:abc123,bob:def456,ci:ghj789\n```\n\nClients send their secret in the `x-gateway-key` header (separate from Anthropic's\n`x-api-key`, which is forwarded upstream untouched):\n\n```bash\ncurl -X POST http://localhost:3001/v1/messages \\\n  -H 'x-gateway-key: abc123' \\\n  -H \"x-api-key: $ANTHROPIC_API_KEY\" \\\n  -H 'anthropic-version: 2023-06-01' \\\n  -H 'content-type: application/json' \\\n  -d '{\"model\":\"claude-sonnet-4-6\",\"max_tokens\":100,\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}]}'\n```\n\nFrom the Anthropic SDK, pass the header via `defaultHeaders`:\n\n```typescript\nimport Anthropic from '@anthropic-ai/sdk';\n\nconst client = new Anthropic({\n  baseURL: 'http://localhost:3001',\n  defaultHeaders: { 'x-gateway-key': process.env.GATEWAY_KEY },\n});\n```\n\nThe authenticated `keyId` (the label half of the pair) propagates downstream:\n\n- **Request logs** — appears as `keyId` in the structured JSON log line.\n- **Rate limiting** — the default `rateLimit` keys on `keyId` when set.\n- **Langfuse traces** — emitted as the `user.id` span attribute, surfaced as the\n  user filter in the Langfuse UI.\n\nWhen `GATEWAY_API_KEYS` is unset, the gateway runs **unauthenticated**. This keeps\nthe 60-second quickstart frictionless but is unsafe for any deployment with a\npublic endpoint. A warning is logged on startup if `NODE_ENV=production` and no\nkeys are configured.\n\n## Middleware\n\nMiddleware runs before every upstream call. It can inspect or modify the request, reject\nit early, or observe the response. Middleware is configured in `src/middleware/config.ts`.\n\n```typescript\nimport { costGuard, rateLimit, piiRedactor } from './index';\n\nexport const middlewares: MiddlewareFn[] = [\n  costGuard(0.10),                              // reject if worst-case cost \u003e $0.10\n  rateLimit({ requests: 100, windowMs: 60_000 }),  // 100 req/min per API key\n  piiRedactor([/\\b\\d{3}-\\d{2}-\\d{4}\\b/g]),     // redact SSNs from messages\n];\n```\n\n### Built-in middleware\n\n| Factory | Description |\n| --- | --- |\n| `costGuard(limitUsd)` | Rejects with 429 when worst-case cost (model × max_tokens) exceeds `limitUsd`. Uses the built-in pricing table; unknown models pass through. |\n| `apiKeyAuth({ keys, headerName? })` | Allowlist authentication. Rejects with 401 if `x-gateway-key` is missing or unknown. On success, sets `ctx.auth.keyId` for downstream middleware. See [Authentication](#authentication). |\n| `rateLimit({ requests, windowMs, keyFn? })` | In-memory fixed-window counter. Keys on `ctx.auth.keyId` when present, otherwise `x-api-key`. Override with `keyFn`. **State is per-process and resets on restart — do not deploy behind a load balancer without a shared store.** |\n| `piiRedactor(patterns, replacement?)` | Regex-replaces matches in message `text` content blocks before forwarding. |\n\n### Writing custom middleware\n\nMiddleware is a `(ctx, next) =\u003e Promise\u003cResponse\u003e` function:\n\n```typescript\nimport type { MiddlewareFn } from './types';\n\nconst myMiddleware: MiddlewareFn = async (ctx, next) =\u003e {\n  // inspect: ctx.model, ctx.maxTokens, ctx.body, ctx.clientHeaders\n  if (ctx.model.startsWith('claude-opus')) {\n    return new Response(JSON.stringify({ type: 'error', error: { type: 'forbidden', message: 'Opus not allowed' } }), {\n      status: 403, headers: { 'content-type': 'application/json' },\n    });\n  }\n  return next();  // or: const res = await next(); then inspect res\n};\n```\n\n## Routing\n\nRouting rules live in `src/routing/config.ts`. Rules are TypeScript functions —\nno YAML, no DSL.\n\n```typescript\nimport { firstMatch, whenModel, chain, anthropic, ollama } from './index';\n\nexport const router = firstMatch([\n  // Route ollama/* models to local Ollama, fall back to Anthropic if unavailable\n  whenModel(/^ollama\\//i, chain(ollama, anthropic)),\n]);\n```\n\n### Helpers\n\n| Helper | Description |\n| --- | --- |\n| `whenModel(pattern, provider)` | Match on model name (string or regex) |\n| `chain(p1, p2, ...)` | Try providers left-to-right; fall back on 5xx or network error |\n| `firstMatch(rules, fallback?)` | Evaluate rules top-to-bottom; first match wins |\n\n### Fallback behaviour\n\nWhen a provider list is returned (via `chain`), the proxy tries each in order:\n\n- **429 / 503 / 529** — retry the same provider with exponential backoff (see [Retries](#retries))\n- **Other 5xx** — drain the body, try the next provider immediately\n- **Network error** — try the next provider immediately\n- **4xx** — forward to the client immediately (no retry)\n- **All providers exhausted** — return 502\n\n## Agent sessions\n\nWhen an agent makes many LLM calls in a loop, the gateway can correlate them into a\nsingle session in Langfuse using either of two mechanisms:\n\n### `x-session-id` header — simple loops\n\nSet the same UUID on every call in an agent run. The gateway attaches it as a `session.id` span attribute (standard OTel; also recognised\nby Langfuse) and strips the header before forwarding to upstream.\n\n```typescript\nimport Anthropic from '@anthropic-ai/sdk';\nimport { randomUUID } from 'crypto';\n\nconst client = new Anthropic({ baseURL: 'http://localhost:3001' });\nconst sessionId = randomUUID();\n\nfor (const step of agentSteps) {\n  await client.messages.create(step, {\n    headers: { 'x-session-id': sessionId },\n  });\n}\n```\n\n### W3C `traceparent` — OTel-instrumented frameworks\n\nIf your agent framework (LangChain, CrewAI, custom OTel setup) propagates W3C trace\ncontext, the gateway automatically nests its `gen_ai.request` spans as children of the\nincoming trace. No code changes needed on the client side.\n\n## Retries\n\nThe proxy retries 429 (rate limited), 503 (service unavailable), and 529 (Anthropic\noverloaded) on the same provider before falling back to the next one.\n\n**Backoff**: full jitter — `random(0, baseDelay × 2^attempt)`. If the upstream sends a\n`Retry-After` header (≤ 60 s), that value is used instead.\n\n**Environment variables:**\n\n| Variable | Default | Description |\n| --- | --- | --- |\n| `MAX_RETRIES` | `3` | Per-provider retry attempts |\n| `RETRY_BASE_DELAY_MS` | `1000` | Base delay for backoff (ms) |\n\n## Upstream timeouts\n\nThe gateway uses [`undici`](https://github.com/nodejs/undici) for upstream calls.\nTwo granular timeouts cap how long we wait for an upstream provider. Both also\nrespond to per-request `AbortSignal` cancellation, so a client disconnect cancels\nthe upstream call in flight.\n\n| Variable | Default | Description |\n| --- | --- | --- |\n| `UPSTREAM_HEADERS_TIMEOUT_MS` | `30000` | Time to wait for the first response byte (ms). Mirrors undici's `headersTimeout`. |\n| `UPSTREAM_BODY_TIMEOUT_MS` | `300000` | Time to wait for the full response body (ms). Mirrors undici's `bodyTimeout`. |\n\n## Graceful shutdown\n\nOn `SIGTERM` or `SIGINT` the gateway:\n\n1. Aborts in-flight upstream calls (the per-request `AbortSignal` is chained to a\n   process-wide shutdown signal).\n2. Stops accepting new connections via `server.close()`.\n3. If `server.close()` hasn't returned within `SHUTDOWN_TIMEOUT_MS`, forces sockets\n   shut with `server.closeAllConnections()`.\n4. Flushes the OpenTelemetry SDK so trailing spans reach Langfuse.\n5. Exits.\n\nThe default 25 s timeout stays under ECS's 30 s `SIGKILL` window so the orchestrator\nsees a clean exit during rolling deploys.\n\n| Variable | Default | Description |\n| --- | --- | --- |\n| `SHUTDOWN_TIMEOUT_MS` | `25000` | Max drain time before sockets are force-closed (ms). |\n\n## Logging\n\nRequest logs are written as structured JSON to stdout — compatible with CloudWatch,\nDatadog, or any log aggregation tool.\n\n```json\n{\"level\":30,\"time\":1748470913,\"requestId\":\"a3f7b912\",\"method\":\"POST\",\"path\":\"/v1/messages\",\n \"status\":200,\"latencyMs\":487,\"model\":\"claude-sonnet-4-6\",\"provider\":\"anthropic\",\n \"inputTokens\":343,\"outputTokens\":13,\"costUsd\":0.000224}\n```\n\nIn development (`NODE_ENV=development`), set `LOG_LEVEL=debug` and logs are formatted\nwith `pino-pretty` for readability.\n\n**Environment variables:**\n\n| Variable | Default | Description |\n| --- | --- | --- |\n| `LOG_LEVEL` | `info` | `trace` \\| `debug` \\| `info` \\| `warn` \\| `error` \\| `fatal` |\n| `CAPTURE_CONTENT` | _(unset)_ | Set to `true` to record prompt and completion in Langfuse traces (`gen_ai.prompt` / `gen_ai.completion` span attributes). Off by default — message content stays out of telemetry. |\n\n## Cost tracking\n\nThe `gen_ai.usage.cost_usd` span attribute is set on every non-streaming response where\nthe model is in the pricing table. Cost also appears in the request log as `costUsd`.\n\nPricing data lives in `src/pricing/anthropic.ts` (auto-generated). To refresh it:\n\n```bash\nnpm run update-pricing\n```\n\nThe script fetches the [LiteLLM community pricing registry](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json),\nvalidates the schema, prints a human-readable diff, and regenerates the file. It exits\nnon-zero if the upstream schema changes in a breaking way, so CI fails loudly.\n\nA GitHub Actions workflow (`.github/workflows/update-pricing.yml`) runs this every\nMonday and opens a PR when prices change.\n\n## What you see in Langfuse\n\n![Langfuse trace](docs/langfuse-trace.png \"Screenshot of a langfuse trace including Metadata attributes\")\n\nEach request produces a `gen_ai.request` span with:\n\n| Attribute | Example |\n| --- | --- |\n| `gen_ai.system` | `anthropic` or `ollama` |\n| `gen_ai.request.model` | `claude-sonnet-4-6` |\n| `gen_ai.request.max_tokens` | `32000` |\n| `gen_ai.response.model` | `claude-sonnet-4-6` |\n| `gen_ai.usage.input_tokens` | `343` |\n| `gen_ai.usage.output_tokens` | `13` |\n| `gen_ai.usage.cost_usd` | `0.000224` |\n| `gen_ai.response.finish_reasons` | `[\"end_turn\"]` |\n| `gen_ai.prompt` | `[{\"role\":\"user\",\"content\":\"Hello\"}]` _(opt-in: `CAPTURE_CONTENT=true`)_ |\n| `gen_ai.completion` | `[{\"type\":\"text\",\"text\":\"Hi there\"}]` _(opt-in: `CAPTURE_CONTENT=true`)_ |\n\n`gen_ai.system` reflects the provider that actually handled the request — useful for\ndistinguishing local vs. cloud inference in Langfuse dashboards.\n\n## Docker\n\n```bash\ndocker build -t yallmap .\ndocker run -p 3001:3001 \\\n  -e OTEL_EXPORTER_OTLP_ENDPOINT=http://host.docker.internal:3000/api/public/otel/v1/traces \\\n  -e LANGFUSE_PUBLIC_KEY=pk-lf-... \\\n  -e LANGFUSE_SECRET_KEY=sk-lf-... \\\n  yallmap\n```\n\nThe multi-stage Dockerfile builds in `node:22-alpine`, copies only compiled output into\nthe final image. No dev dependencies or TypeScript source in the production image.\n\nFor AWS deployment, see the companion CDK construct:\n[cdk-yallmap](https://github.com/kevinkempf/cdk-yallmap).\n\n## Adding a provider\n\nImplement `ProviderAdapter` from `src/adapters/types.ts`:\n\n```typescript\n// src/adapters/my-provider.ts\nimport type { ProviderAdapter } from './types';\n\nexport const myProviderAdapter: ProviderAdapter = {\n  path: '/v1/chat/completions',           // upstream path\n  translateRequest: (body) =\u003e { /* ... */ return translated; },\n  translateResponse: (body) =\u003e { /* ... */ return translated; },\n  createStreamTranslator: () =\u003e new MyStreamTransform(),\n};\n```\n\nThen add a `Provider` entry in `src/routing/index.ts` and reference it in\n`src/routing/config.ts`. The existing Ollama adapter is the reference implementation.\n\n## Design decisions\n\n**Provider adapters as a formal interface.** `ProviderAdapter` defines the three\ntranslation surfaces — request body, response body, SSE stream — so new providers are\ndrop-in files with no changes to the router or proxy. The `anthropicAdapter` is an\nidentity pass-through; the `ollamaAdapter` is the reference implementation of a full\ntranslation.\n\n**Routing policies as TypeScript functions.** Rules are typed predicates — `whenModel`,\n`chain`, `firstMatch`. No YAML DSL, no CEL expressions. Adding a rule is adding a line\nof code with full type safety and IDE autocomplete.\n\n**Anthropic API surface preserved end-to-end.** Ollama uses an OpenAI-compatible API;\nthe gateway translates requests and responses transparently so all clients speak the\nAnthropic Messages API regardless of which provider handles the request.\n\n**SSE never buffered.** The streaming response is piped through a Transform stream that\nreads events in-flight. The client receives bytes as they arrive; nothing is held in\nmemory waiting for the response to complete.\n\n**OTel Gen AI semantic conventions.** Spans use the\n[`gen_ai.*` attribute namespace](https://opentelemetry.io/docs/specs/semconv/gen-ai/)\nso traces are interoperable with any OTel-compatible backend, not just Langfuse.\n\n**`accept-encoding: identity` enforced upstream.** Compressed responses can't be parsed\nfor telemetry. The gateway requests uncompressed from upstream and forwards uncompressed\nto the client.\n\n**Middleware as a compile-time chain.** Middleware is a list of typed\n`(ctx, next) =\u003e Promise\u003cResponse\u003e` functions composed at startup. Each function either\ncalls `next()` to continue or returns its own Response to short-circuit. This keeps the\nproxy loop clean — policy decisions (cost guards, rate limiting, PII redaction) live\noutside the retry/fallback logic and are trivially testable in isolation.\n\n## Roadmap\n\n- [x] v0.1 — transparent Anthropic proxy + OTel observability\n- [x] v0.2 — TypeScript routing policies, Ollama adapter, fallback chains\n- [x] v0.3 — cost tracking, exponential retry with backoff, structured pino logging\n- [x] v0.4 — CDK construct for ECS Fargate deployment ([cdk-yallmap](https://github.com/kevinkempf/cdk-yallmap))\n- [x] v0.5 — formalized `ProviderAdapter` interface; drop-in provider plugins; agent session groundwork (`x-session-id`, W3C trace context)\n- [x] v0.6 — compile-time middleware chain (`costGuard`, `rateLimit`, `piiRedactor`; custom middleware support); opt-in content capture (`CAPTURE_CONTENT`)\n- [x] v0.7 — multi-key auth (`apiKeyAuth`) with identity propagation to rate limit, logs, and Langfuse `user.id`; request body-size limit; clamped `Retry-After` + equal-jitter backoff; OSS publication scaffolding (LICENSE, CONTRIBUTING, CI, Dependabot)\n- [x] v0.8 — production hardening: streaming body-size cap, configurable upstream timeouts + per-request `AbortSignal`, graceful HTTP shutdown\n- [ ] v0.9 — TBD (candidates: persistent rate-limit state, per-key cost budgets, Prometheus `/metrics`)\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkckempf%2Fyallmap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkckempf%2Fyallmap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkckempf%2Fyallmap/lists"}