{"id":45874810,"url":"https://github.com/ferro-labs/ai-gateway","last_synced_at":"2026-05-24T08:02:02.701Z","repository":{"id":340926134,"uuid":"1166416848","full_name":"ferro-labs/ai-gateway","owner":"ferro-labs","description":"Unified AI Gateway for 30+ LLMs (OpenAI, Anthropic, Bedrock, Azure etc) with Caching, Guardrails, A/B test \u0026 cost controls. Go-native Fastest \u0026 Scalable AI Gateway LiteLLM \u0026 Kong AI Gateway alternative.","archived":false,"fork":false,"pushed_at":"2026-05-23T11:22:58.000Z","size":2834,"stargazers_count":95,"open_issues_count":22,"forks_count":15,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-05-23T13:19:26.115Z","etag":null,"topics":["ai-gateway","ai-infrastructure","gateway","guardrails","kong","litellm","llm","llm-cost","llm-proxy","llm-strategy","llmops","mcp","pii-detection","prompt-management","semantic-cache"],"latest_commit_sha":null,"homepage":"https://docs.ferrolabs.ai","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ferro-labs.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-02-25T07:46:16.000Z","updated_at":"2026-05-22T12:31:39.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ferro-labs/ai-gateway","commit_stats":null,"previous_names":["ferro-labs/ai-gateway"],"tags_count":27,"template":false,"template_full_name":null,"purl":"pkg:github/ferro-labs/ai-gateway","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ferro-labs%2Fai-gateway","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ferro-labs%2Fai-gateway/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ferro-labs%2Fai-gateway/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ferro-labs%2Fai-gateway/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ferro-labs","download_url":"https://codeload.github.com/ferro-labs/ai-gateway/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ferro-labs%2Fai-gateway/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33426013,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-23T22:14:44.296Z","status":"online","status_checked_at":"2026-05-24T02:00:06.296Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-gateway","ai-infrastructure","gateway","guardrails","kong","litellm","llm","llm-cost","llm-proxy","llm-strategy","llmops","mcp","pii-detection","prompt-management","semantic-cache"],"created_at":"2026-02-27T11:30:08.666Z","updated_at":"2026-05-24T08:02:02.694Z","avatar_url":"https://github.com/ferro-labs.png","language":"Go","funding_links":[],"categories":["*Ops for AI","🧭 Open Source LLM Gateways and Serving Systems","Inference"],"sub_categories":["LLMOps","LLM Router"],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cp align=\"right\"\u003e\n  \u003ca href=\"README.md\"\u003eEnglish\u003c/a\u003e | \u003ca href=\"README.zh-CN.md\"\u003e中文\u003c/a\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003e\n  \u003cimg src=\"docs/logo.png\" alt=\"Ferro Labs AI Gateway\" height=\"60\" align=\"absmiddle\" /\u003e Ferro Labs AI Gateway\n\u003c/h1\u003e\n\n**High-performance AI gateway in Go. Route LLM requests across 30 providers via a single OpenAI-compatible API.**\n\n**Deploy templates**\n\n[![Deploy on Railway: SQLite](https://railway.com/button.svg)](https://railway.com/deploy/ferro-labs-ai-sqlite-storage?referralCode=KblxKX\u0026utm_medium=integration\u0026utm_source=template\u0026utm_campaign=generic)\n[![Deploy on Railway: PostgreSQL](https://railway.com/button.svg)](https://railway.com/deploy/ferro-labs-ai-postgresql-storage?referralCode=KblxKX\u0026utm_medium=integration\u0026utm_source=template\u0026utm_campaign=generic)\n[![Deploy to Render: PostgreSQL](https://render.com/images/deploy-to-render-button.svg)](https://render.com/deploy?repo=https://github.com/ferro-labs/ai-gateway)\n\n[![Go](https://img.shields.io/badge/go-1.25+-00ADD8.svg)](https://go.dev)\n[![Go Reference](https://pkg.go.dev/badge/github.com/ferro-labs/ai-gateway.svg)](https://pkg.go.dev/github.com/ferro-labs/ai-gateway)\n[![codecov](https://codecov.io/gh/ferro-labs/ai-gateway/branch/main/graph/badge.svg)](https://codecov.io/gh/ferro-labs/ai-gateway)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)\n[![GitHub Stars](https://img.shields.io/github/stars/ferro-labs/ai-gateway?style=flat\u0026color=yellow)](https://github.com/ferro-labs/ai-gateway/stargazers)\n[![CI](https://github.com/ferro-labs/ai-gateway/actions/workflows/ci.yml/badge.svg)](https://github.com/ferro-labs/ai-gateway/actions/workflows/ci.yml)\n[![Code Scanning](https://github.com/ferro-labs/ai-gateway/actions/workflows/code-scanning.yml/badge.svg)](https://github.com/ferro-labs/ai-gateway/actions/workflows/code-scanning.yml)\n[![Ask DeepWiki](https://deepwiki.com/badge.svg?url=https%3A%2F%2Fdeepwiki.com%2Fferro-labs%2Fai-gateway)](https://deepwiki.com/ferro-labs/ai-gateway)\n[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/ferro-labs)](https://artifacthub.io/packages/search?org=ferro-labs)\n[![Discord](https://img.shields.io/badge/Discord-Join%20Us-5865F2?logo=discord\u0026logoColor=white)](https://discord.gg/yCAeYvJeDV)\n\n🔀 **30 providers, 2,500+ models — one API**\u003cbr/\u003e\n⚡ **13,925 RPS at 1,000 concurrent users**\u003cbr/\u003e\n📦 **Single binary, zero dependencies, 32 MB base memory**\n\n\u003cimg src=\"docs/architecture.svg\" alt=\"Ferro Labs AI Gateway Architecture\" width=\"100%\" /\u003e\n\n\u003c/div\u003e\n\n---\n\n## Quick Start\n\nGet from zero to first request in under 2 minutes.\n\n### Option A — Binary (fastest)\n\n```bash\ncurl -fsSL https://github.com/ferro-labs/ai-gateway/releases/download/v1.0.6/ferrogw_1.0.6_linux_amd64.tar.gz | tar xz\nchmod +x ferrogw\n./ferrogw init          # generates config.yaml + MASTER_KEY\n./ferrogw               # starts the server\n```\n\n### Option B — Docker\n\n```bash\ndocker pull ghcr.io/ferro-labs/ai-gateway:latest\ndocker run -p 8080:8080 \\\n  -e OPENAI_API_KEY=sk-your-key \\\n  -e MASTER_KEY=fgw_your-master-key \\\n  ghcr.io/ferro-labs/ai-gateway:latest\n```\n\n### Option C — Go\n\n```bash\ngo install github.com/ferro-labs/ai-gateway/cmd/ferrogw@latest\nferrogw init            # first-run setup\nferrogw                 # start the server\n```\n\n### First-time setup\n\n`ferrogw init` generates a master key and writes a minimal `config.yaml`:\n\n```\n$ ferrogw init\n\n  Master key (set as MASTER_KEY env var):\n  fgw_a3f2e1d4c5b6a7f8e9d0c1b2a3f4e5d6\n\n  Config written to: ./config.yaml\n\n  Next steps:\n    export MASTER_KEY=fgw_a3f2e1d4c5b6a7f8e9d0c1b2a3f4e5d6\n    export OPENAI_API_KEY=sk-...\n    ferrogw\n```\n\nThe master key is shown once — store it in your `.env` file or secret manager. It is never written to disk.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"docs/demo.gif\" alt=\"Ferro Labs AI Gateway — Quick Start Demo\" width=\"720\" /\u003e\n\u003c/div\u003e\n\n### Minimal config\n\nCreate `config.yaml` (or use `ferrogw init`):\n\n```yaml\nstrategy:\n  mode: fallback\n\ntargets:\n  - virtual_key: openai\n    retry:\n      attempts: 3\n      on_status_codes: [429, 502, 503]\n  - virtual_key: anthropic\n\naliases:\n  fast: gpt-4o-mini\n  smart: claude-3-5-sonnet-20241022\n```\n\n### First request\n\n```bash\nexport OPENAI_API_KEY=sk-your-key\nexport MASTER_KEY=fgw_your-master-key   # set by ferrogw init\n\ncurl http://localhost:8080/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -H \"Authorization: Bearer $MASTER_KEY\" \\\n  -d '{\n    \"model\": \"gpt-4o-mini\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"Hello from Ferro Labs AI Gateway\"}]\n  }' | jq\n```\n\n---\n\n## Why Ferro Labs\n\nMost AI gateways are Python proxies that crack under load or JavaScript services that eat memory. Ferro Labs AI Gateway is written in Go from the ground up for real-world throughput — a single binary that routes LLM requests with predictable latency and minimal resource usage.\n\n| Feature          | Ferro Labs  | LiteLLM | Bifrost    | Kong AI     |\n|:-----------------|:------------|:--------|:-----------|:------------|\n| Language         | Go          | Python  | Go         | Go/Lua      |\n| Single binary    | ✅          | ❌      | ✅         | ❌          |\n| Providers        | 30          | 100+    | 20+        | 10+         |\n| MCP support      | ✅          | ❌      | ✅         | ❌          |\n| Response cache   | ✅          | ✅      | ✅         | ❌ (paid)   |\n| Guardrails       | ✅          | ✅      | ❌         | ❌ (paid)   |\n| OSS license      | Apache 2.0  | MIT     | Apache 2.0 | Apache 2.0  |\n| Managed cloud    | Coming Soon | ✅      | ✅         | ✅          |\n\n---\n\n## Performance\n\nBenchmarked against Kong OSS, Bifrost, LiteLLM, and Portkey on\n**GCP n2-standard-8** (8 vCPU, 32 GB RAM) using a **60ms fixed-latency\nmock upstream** — results reflect gateway overhead only.\n\n![Throughput comparison — Ferro Labs vs Kong, Bifrost, LiteLLM, Portkey across 150–1,000 VU](docs/benchmarks/throughput-comparison.png)\n\n### Ferro Labs Latency Profile\n\n| VU | RPS | p50 | p99 | Memory |\n|---:|---:|---:|---:|---:|\n| 50 | 813 | 61.3ms | 64.1ms | 36 MB |\n| 150 | 2,447 | 61.2ms | 63.4ms | 47 MB |\n| 300 | 4,890 | 61.2ms | 64.4ms | 72 MB |\n| 500 | 8,014 | 61.5ms | 72.9ms | 89 MB |\n| 1,000 | 13,925 | 68.1ms | 111.9ms | 135 MB |\n\nAt 1,000 VU: **13,925 RPS**, p50 overhead **8.1ms**, memory **135 MB**.\nNo connection pool failures. No throughput ceiling.\n\n### Live Upstream Overhead (OpenAI API)\n\nMeasured against **live OpenAI API** (gpt-4o-mini) using two independent methods:\nthe gateway's `X-Gateway-Overhead-Ms` response header (precise internal timing)\nand paired direct-vs-gateway requests (external black-box validation).\n\n| Configuration | Overhead p50 | Overhead p99 |\n|:---|---:|---:|\n| No plugins (bare proxy) | **0.002ms** (2 microseconds) | 0.03ms |\n| With plugins (word-filter, max-token, logger, rate-limit) | **0.025ms** (25 microseconds) | 0.074ms |\n\nThe gateway adds **25 microseconds** of processing overhead per request in a typical\nproduction configuration. LLM API calls take 500ms-2s — the gateway is 20,000x faster\nthan the provider it proxies.\n\n### How to Reproduce\n\n```bash\ngit clone https://github.com/ferro-labs/ai-gateway-performance-benchmarks\ncd ai-gateway-performance-benchmarks\nmake setup \u0026\u0026 make bench\n```\n\nFull methodology, raw results, and flamegraph analysis:\n[ferro-labs/ai-gateway-performance-benchmarks](https://github.com/ferro-labs/ai-gateway-performance-benchmarks)\n\n---\n\n## Features\n\n### 🔀 Routing\n\n- **8 routing strategies:** single, fallback, load balance, least latency, cost-optimized, content-based, A/B test, conditional\n- Provider failover with configurable retry policies and status code filters\n- Per-request model aliases (`fast → gpt-4o-mini`, `smart → claude-3-5-sonnet`)\n\n### 🔌 Providers (30)\n\n| OpenAI \u0026 Compatible | Anthropic \u0026 Google | Cloud \u0026 Enterprise | Open Source \u0026 Inference |\n|:---|:---|:---|:---|\n| OpenAI | Anthropic | AWS Bedrock | Ollama, Ollama Cloud |\n| Azure OpenAI | Google Gemini | Azure Foundry | Hugging Face |\n| OpenRouter | Vertex AI | Databricks | Replicate |\n| DeepSeek | | Cloudflare Workers AI | Together AI |\n| Perplexity | | | Fireworks |\n| xAI (Grok) | | | DeepInfra |\n| Mistral | | | NVIDIA NIM |\n| Groq | | | SambaNova |\n| Cohere | | | Novita AI |\n| AI21 | | | Cerebras |\n| Moonshot / Kimi | | | Qwen / DashScope |\n\n### 🛡️ Guardrails \u0026 Plugins\n\n- **Word/phrase filtering** — block sensitive terms before they reach providers\n- **Token and message limits** — enforce max_tokens and max_messages per request\n- **Response caching** — in-memory cache with configurable TTL and entry limits\n- **Rate limiting** — global RPS plus per-API-key and per-user RPM limits\n- **Budget controls** — per-API-key USD spend tracking with configurable token pricing\n- **Request logging** — structured logs with optional SQLite/PostgreSQL persistence\n\n### ⚡ Performance\n\n- Per-provider HTTP connection pools with optimized settings\n- `sync.Pool` for JSON marshaling buffers and streaming I/O\n- Zero-allocation stream detection, async hook dispatch batching\n- Single binary, ~32 MB base memory, linear scaling to 1,000+ VUs\n\n### 🤖 MCP (Model Context Protocol)\n\n- Agentic tool-call loop — the gateway drives `tool_calls` automatically\n- Streamable HTTP transport (MCP 2025-11-25 spec)\n- Tool filtering with `allowed_tools` and bounded `max_call_depth`\n- Multiple MCP servers with cross-server tool deduplication\n\n### 📊 Observability\n\n- **OpenTelemetry tracing** (v1.1.0+) — OTLP gRPC/HTTP exporter, W3C `traceparent` propagation, GenAI semantic conventions (`gen_ai.*`) plus `ferro.*` extensions for cost, routing, MCP, and stream timings; `privacy_level` enforced on error recording; configurable `shutdown_grace`\n- Prometheus metrics at `/metrics`\n- Deep health checks at `/health` with per-provider status\n- Structured JSON request logging with SQLite/PostgreSQL persistence (trace ID unified across logs, OTel spans, and `X-Request-ID` response header)\n- Admin API with usage stats, request logs, and config history/rollback\n- Built-in dashboard UI at `/dashboard`\n- HTTP-level connection tracing with DNS, TLS, and first-byte latency\n\n---\n\n## Examples\n\nIntegration examples for common use cases are in [ferro-labs/ai-gateway-examples](https://github.com/ferro-labs/ai-gateway-examples):\n\n| Example | Description |\n|:--------|:------------|\n| [basic](https://github.com/ferro-labs/ai-gateway-examples/tree/main/basic) | Single chat completion to the first configured provider |\n| [fallback](https://github.com/ferro-labs/ai-gateway-examples/tree/main/fallback) | Fallback strategy — try providers in order with retries |\n| [loadbalance](https://github.com/ferro-labs/ai-gateway-examples/tree/main/loadbalance) | Weighted load balancing across targets (70/30 split) |\n| [with-guardrails](https://github.com/ferro-labs/ai-gateway-examples/tree/main/with-guardrails) | Built-in word-filter and max-token guardrail plugins |\n| [with-mcp](https://github.com/ferro-labs/ai-gateway-examples/tree/main/with-mcp) | Local MCP server with tool-calling integration |\n| [embedded](https://github.com/ferro-labs/ai-gateway-examples/tree/main/embedded) | Embed the gateway as an HTTP handler inside an existing server |\n\n---\n\n## Configuration\n\nFull annotated example — copy to `config.yaml` and customize:\n\n```yaml\n# Routing strategy\nstrategy:\n  mode: fallback  # single | fallback | loadbalance | conditional\n                  # least-latency | cost-optimized | content-based | ab-test\n\n# Provider targets (tried in order for fallback mode)\ntargets:\n  - virtual_key: openai\n    retry:\n      attempts: 3\n      on_status_codes: [429, 502, 503]\n      initial_backoff_ms: 100\n  - virtual_key: anthropic\n    retry:\n      attempts: 2\n  - virtual_key: gemini\n\n# Model aliases — resolved before routing\naliases:\n  fast: gpt-4o-mini\n  smart: claude-3-5-sonnet-20241022\n  cheap: gemini-1.5-flash\n\n# Plugins — executed in order at the configured stage\nplugins:\n  - name: word-filter\n    type: guardrail\n    stage: before_request\n    enabled: true\n    config:\n      blocked_words: [\"password\", \"secret\"]\n      case_sensitive: false\n\n  - name: max-token\n    type: guardrail\n    stage: before_request\n    enabled: true\n    config:\n      max_tokens: 4096\n      max_messages: 50\n\n  - name: rate-limit\n    type: guardrail\n    stage: before_request\n    enabled: true\n    config:\n      requests_per_second: 100\n      key_rpm: 60\n\n  - name: request-logger\n    type: logging\n    stage: before_request\n    enabled: true\n    config:\n      level: info\n      persist: true\n      backend: sqlite\n      dsn: ferrogw-requests.db\n\n# MCP tool servers (optional)\nmcp_servers:\n  - name: my-tools\n    url: https://mcp.example.com/mcp\n    headers:\n      Authorization: Bearer ${MY_TOOLS_TOKEN}\n    allowed_tools: [search, get_weather]\n    max_call_depth: 5\n    timeout_seconds: 30\n```\n\nSee [config.example.yaml](config.example.yaml) and [config.example.json](config.example.json) for the full template with all options.\n\n---\n\n## Observability\n\nFerro Labs AI Gateway ships first-class **OpenTelemetry** support in v1.1.0+. When OTel is disabled (the default) the gateway runs with a zero-allocation no-op provider — there is no cost to leaving it off. When you set an OTLP endpoint, every request emits a `gateway.request` root span with rich GenAI semantic conventions plus Ferro-specific extensions for cost, routing, and stream timings.\n\n### Enable in one step\n\nEither set the standard OTel environment variable:\n\n```bash\nexport OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317\nferrogw serve\n```\n\n…or add an `observability` block to `config.yaml`:\n\n```yaml\nobservability:\n  tracing:\n    enabled: true\n    endpoint: localhost:4317   # or leave blank to read OTEL_EXPORTER_OTLP_ENDPOINT\n    protocol: grpc             # grpc | http/protobuf\n    service_name: ferrogw\n    sample_ratio: 1.0\n    privacy_level: metadata    # none | metadata | full  (see below)\n    shutdown_grace: 10s        # max time to drain OTel exports on shutdown\n    # headers:                        # OTLP export headers for authenticated backends\n    #   dd-api-key: \"${DATADOG_API_KEY}\"  # values support ${ENV_VAR} interpolation\n\n  # exporters wires plugin observability exporters (see \"Plugin exporters\" below).\n  # exporters:\n  #   - name: langsmith\n  #     enabled: true\n  #     config:\n  #       api_key: \"${LANGSMITH_API_KEY}\"\n```\n\nStandard `OTEL_*` environment variables (e.g. `OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_TRACES_SAMPLER`) always take precedence over the config file — this matches the OTel SDK convention and is required for predictable container deployments.\n\n`observability.tracing.headers` lets you send OTLP traces to authenticated managed backends (Datadog, New Relic, Honeycomb, Grafana Cloud) by setting vendor-specific headers such as API keys. Values support `${ENV_VAR}` interpolation so secrets are never stored literally in the config file. The standard `OTEL_EXPORTER_OTLP_HEADERS` environment variable also applies per OTel convention.\n\nThe **endpoint scheme selects transport security**: an `https://` endpoint uses TLS, while an `http://` endpoint or a bare `host:port` (e.g. `localhost:4317`) connects in plaintext. Managed backends require the `https://` form.\n\n### What gets emitted\n\nThe following attributes are **currently emitted** on the `gateway.request` root span. Attributes marked \"Planned\" are reserved but not yet wired.\n\n- **`gateway.request`** root span per request (`SERVER` kind) with `gen_ai.system`, `gen_ai.operation.name`, `gen_ai.request.model`, `gen_ai.response.model`, `gen_ai.usage.{input,output}_tokens`\n- **`HTTP {GET,POST}`** child span per outbound provider call (`CLIENT` kind, via `otelhttp` transport wrapping) — propagates `traceparent` to upstream providers\n- **`ferro.*` emitted attributes**: `ferro.cost.{usd,input_usd,output_usd,cache_read_usd,cache_write_usd,reasoning_usd,model_found}`, `ferro.routing.{strategy,target_key}`, `ferro.stream.time_to_{first,last}_token_ms`, `ferro.gateway.trace_id`, `ferro.plugin.{name,kind,stage,outcome,reason}`, `ferro.mcp.{server,tool,latency_ms}`\n- **W3C TraceContext + Baggage** propagation: inbound `traceparent` is honoured; outbound requests carry it forward\n- **Unified trace ID**: the OTel `trace_id`, the `X-Request-ID` response header, and the `trace_id` field on every log line are guaranteed equal per request for all requests served through the gateway's HTTP stack. (Embedders that bypass `logging.Middleware` receive a consistent-but-independent span trace ID.)\n\n### Try it locally with Jaeger\n\n```bash\ndocker run --rm -p 16686:16686 -p 4317:4317 jaegertracing/all-in-one\nOTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317 ferrogw serve\n# fire a request, then open http://localhost:16686\n```\n\n### Privacy levels\n\n`privacy_level` controls how error messages are recorded on spans. No prompt or response content is exported at any level — that requires a future L3 exporter plugin.\n\n| Level | Error recording on spans | Default |\n|:------|:------|:------|\n| `none` | Status and exception carry only the static string `\"redacted\"` — no content or internal type exposed | — |\n| `metadata` | Error message is redacted (email / JWT / AWS keys replaced by tokens) before being attached | ✅ |\n| `full` | Raw error text recorded without redaction — for trusted self-hosted debugging only | — |\n\nInvalid values are rejected at startup by config validation.\n\n### Plugin exporters\n\nThe `observability.exporters` config block wires plugin exporters that receive `gateway.request.completed` and `gateway.request.failed` events on every request. Exporters operate independently of whether an OTLP tracing endpoint is configured.\n\n**No built-in exporter plugins ship in this repo.** They are provided by the `ai-gateway-plugins` repository and self-register via `observability.RegisterExporter` in their `init()`. The `observability.Exporter` contract is stable as of v1.1.0. Unrecognised or failing exporters emit a warning and are skipped — the gateway still starts.\n\n---\n\n## CLI\n\n`ferrogw` is a single binary — no separate CLI tool required.\n\n| Command | Description |\n|:--------|:------------|\n| `ferrogw` | Start the gateway server (default) |\n| `ferrogw serve` | Start the gateway server (explicit) |\n| `ferrogw init` | First-run setup — generate master key and config |\n| `ferrogw validate` | Validate a config file without starting |\n| `ferrogw doctor` | Check environment (API keys, config, connectivity) |\n| `ferrogw status` | Show gateway health and provider status |\n| `ferrogw version` | Print version, commit, and build info |\n| `ferrogw admin keys list` | List API keys |\n| `ferrogw admin keys create \u003cname\u003e` | Create an API key |\n| `ferrogw admin logs stats` | Show request log statistics |\n| `ferrogw plugins` | List registered plugins |\n\nGlobal flags available on all subcommands: `--gateway-url`, `--api-key`, `--format` (table/json/yaml).\n\n---\n\n## Deployment\n\n### Local development\n\n```bash\nexport OPENAI_API_KEY=sk-your-key\nexport MASTER_KEY=fgw_your-master-key\nexport GATEWAY_CONFIG=./config.yaml\nmake build \u0026\u0026 ./bin/ferrogw\n```\n\n### Railway (SQLite)\n\nFor a fast Railway deploy with persistent SQLite storage, attach a Railway Volume at `/data` and set:\n\n```bash\nMASTER_KEY=fgw_your-master-key\nOPENAI_API_KEY=sk-your-key\nPORT=8080\nAPI_KEY_STORE_BACKEND=sqlite\nAPI_KEY_STORE_DSN=/data/keys.db\nCONFIG_STORE_BACKEND=sqlite\nCONFIG_STORE_DSN=/data/config.db\nREQUEST_LOG_STORE_BACKEND=sqlite\nREQUEST_LOG_STORE_DSN=/data/logs.db\nRAILWAY_RUN_UID=0\n```\n\n### Render (PostgreSQL)\n\nThe repo includes a `render.yaml` Blueprint for a one-click Render deploy with a Docker web service and managed Postgres database. It generates `MASTER_KEY`, asks the user for `OPENAI_API_KEY`, and wires the three store DSNs to the database's internal connection string automatically.\n\nUse the button at the top of this README, or deploy directly from:\n\n```text\nhttps://render.com/deploy?repo=https://github.com/ferro-labs/ai-gateway\n```\n\n### Option D — Docker Compose (dev \u0026 prod)\n\nThe repo ships three Compose files that follow the standard override pattern:\n\n| File | Purpose |\n|---|---|\n| `docker-compose.yml` | Base — shared image, port mapping, all provider env var stubs |\n| `docker-compose.dev.yml` | Dev — builds from source, debug logging, live config mount, Ollama host access |\n| `docker-compose.prod.yml` | Prod — pinned image tag, restart policy, health check, resource limits, log rotation |\n\n**Dev** (builds from source):\n\n```bash\ndocker compose -f docker-compose.yml -f docker-compose.dev.yml up\n```\n\n**Prod** (pin to a release tag — never use `latest` in production):\n\n```bash\nIMAGE_TAG=v1.0.6 CORS_ORIGINS=https://your-domain.com \\\n  docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d\n```\n\nProvider API keys are commented out in `docker-compose.yml`. Uncomment and set the ones you need, or supply them via a `.env` file in the same directory.\n\n---\n\n### Docker Compose (with PostgreSQL)\n\n```yaml\nservices:\n  ferrogw:\n    image: ghcr.io/ferro-labs/ai-gateway:latest\n    ports:\n      - \"8080:8080\"\n    environment:\n      - OPENAI_API_KEY=${OPENAI_API_KEY}\n      - GATEWAY_CONFIG=/etc/ferrogw/config.yaml\n      - CONFIG_STORE_BACKEND=postgres\n      - CONFIG_STORE_DSN=postgresql://ferrogw:ferrogw@db:5432/ferrogw?sslmode=disable\n      - API_KEY_STORE_BACKEND=postgres\n      - API_KEY_STORE_DSN=postgresql://ferrogw:ferrogw@db:5432/ferrogw?sslmode=disable\n      - REQUEST_LOG_STORE_BACKEND=postgres\n      - REQUEST_LOG_STORE_DSN=postgresql://ferrogw:ferrogw@db:5432/ferrogw?sslmode=disable\n    volumes:\n      - ./config.yaml:/etc/ferrogw/config.yaml:ro\n    depends_on:\n      - db\n\n  db:\n    image: postgres:16-alpine\n    environment:\n      POSTGRES_USER: ferrogw\n      POSTGRES_PASSWORD: ferrogw\n      POSTGRES_DB: ferrogw\n    volumes:\n      - pgdata:/var/lib/postgresql/data\n\nvolumes:\n  pgdata:\n```\n\n### Kubernetes via Helm\n\n[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/ferro-labs)](https://artifacthub.io/packages/search?org=ferro-labs)\n\n```bash\nhelm repo add ferro-labs https://ferro-labs.github.io/helm-charts\nhelm repo update\nhelm install ferro-gw ferro-labs/ai-gateway \\\n  --set env.OPENAI_API_KEY=sk-your-key\n```\n\nHelm charts: [github.com/ferro-labs/helm-charts](https://github.com/ferro-labs/helm-charts) | [ArtifactHub](https://artifacthub.io/packages/search?org=ferro-labs)\n\n---\n\n## Migrate to Ferro Labs AI Gateway\n\n### From LiteLLM\n\nLiteLLM users can migrate in one step. Ferro Labs AI Gateway is OpenAI-compatible — change one line in your code:\n\n**Python (before — LiteLLM):**\n\n```python\nfrom litellm import completion\n\nresponse = completion(\n    model=\"gpt-4o\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello\"}]\n)\n```\n\n**Python (after — Ferro Labs AI Gateway):**\n\n```python\nfrom openai import OpenAI\n\nclient = OpenAI(\n    base_url=\"http://localhost:8080/v1\",\n    api_key=\"your-ferro-api-key\",\n)\n\nresponse = client.chat.completions.create(\n    model=\"gpt-4o\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n)\n```\n\n**Node.js (after — Ferro Labs AI Gateway):**\n\n```typescript\nimport OpenAI from \"openai\";\n\nconst client = new OpenAI({\n  baseURL: \"http://localhost:8080/v1\",\n  apiKey: \"your-ferro-api-key\",\n});\n\nconst response = await client.chat.completions.create({\n  model: \"gpt-4o\",\n  messages: [{ role: \"user\", content: \"Hello\" }],\n});\n```\n\n**Why migrate from LiteLLM:**\n\n- 14x higher throughput at 150 concurrent users (2,447 vs 175 RPS)\n- 23x less memory at peak load (47 MB vs 1,124 MB under streaming)\n- Single binary — no Python environment, no pip, no virtualenv\n- Predictable latency — p99 stays under 65 ms at 150 VU vs LiteLLM's timeouts at the same concurrency\n\n**Config migration:**\n\n```\n# LiteLLM config.yaml               # Ferro Labs config.yaml\nmodel_list:                          strategy:\n  - model_name: gpt-4o                mode: fallback\n    litellm_params:\n      model: gpt-4o                  targets:\n      api_key: sk-...                  - virtual_key: openai\n  - model_name: claude-3-5-sonnet     - virtual_key: anthropic\n    litellm_params:\n      model: claude-3-5-sonnet       aliases:\n      api_key: sk-ant-...              fast: gpt-4o\n                                       smart: claude-3-5-sonnet-20241022\n```\n\nProvider API keys are set via environment variables (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.) — not in the config file.\n\n### From Portkey\n\nPortkey users: Ferro Labs AI Gateway uses the standard OpenAI SDK — no custom headers required in self-hosted mode.\n\n**Before (Portkey hosted):**\n\n```python\nfrom portkey_ai import Portkey\n\nclient = Portkey(api_key=\"portkey-key\")\nresponse = client.chat.completions.create(\n    model=\"gpt-4o\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n)\n```\n\n**After (Ferro Labs AI Gateway self-hosted):**\n\n```python\nfrom openai import OpenAI\n\nclient = OpenAI(\n    base_url=\"http://localhost:8080/v1\",\n    api_key=\"your-ferro-api-key\",\n)\n\nresponse = client.chat.completions.create(\n    model=\"gpt-4o\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n)\n```\n\n**Why migrate from Portkey:**\n\n- Fully open source — no per-request pricing, no log limits\n- Self-hosted — your data never leaves your infrastructure\n- No vendor lock-in — Apache 2.0 license\n- MCP support — Portkey self-hosted lacks native MCP\n- FerroCloud (coming soon) for teams that want a managed service\n\n### From OpenAI SDK directly\n\nNo gateway yet? Add Ferro Labs AI Gateway in front of your existing code with a single `base_url` change. No other code changes required.\n\n```python\n# Before — calling OpenAI directly\nclient = OpenAI(api_key=\"sk-...\")\n\n# After — routing through Ferro Labs AI Gateway\n# Gains: failover, caching, rate limiting, cost tracking\nclient = OpenAI(\n    base_url=\"http://localhost:8080/v1\",\n    api_key=\"your-ferro-api-key\",\n)\n```\n\nFerro Labs AI Gateway handles provider failover automatically — if OpenAI is down, your requests fall through to Anthropic or Gemini with zero application code changes.\n\n---\n\n## FerroCloud\n\nFerroCloud — the managed version of Ferro Labs AI Gateway with multi-tenancy, analytics, and cost governance — is coming soon.\n\n👉 **Join the waitlist at [ferrolabs.ai](https://ferrolabs.ai)**\n\n---\n\n## SDKs\n\nOfficial client libraries for the Ferro Labs AI Gateway:\n\n| SDK | Install | Repository |\n|:----|:--------|:-----------|\n| Python | `pip install ferrolabs` | [ferro-labs/ferrolabs-python-sdk](https://github.com/ferro-labs/ferrolabs-python-sdk) |\n| TypeScript | `npm install ferrolabs` | [ferro-labs/ferrolabs-typescript-sdk](https://github.com/ferro-labs/ferrolabs-typescript-sdk) |\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003ePython\u003c/strong\u003e\u003c/summary\u003e\n\n```python\nfrom ferrolabs import FerroClient\n\nclient = FerroClient(\n    base_url=\"http://localhost:8080/v1\",\n    api_key=\"your-ferro-api-key\",\n)\n\nresponse = client.chat.completions.create(\n    model=\"gpt-4o\",\n    messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eTypeScript\u003c/strong\u003e\u003c/summary\u003e\n\n```typescript\nimport { FerroClient } from \"ferrolabs\";\n\nconst client = new FerroClient({\n  baseURL: \"http://localhost:8080/v1\",\n  apiKey: \"your-ferro-api-key\",\n});\n\nconst response = await client.chat.completions.create({\n  model: \"gpt-4o\",\n  messages: [{ role: \"user\", content: \"Hello\" }],\n});\n```\n\n\u003c/details\u003e\n\n### OpenAI SDK Compatible\n\nYou can also use the standard OpenAI SDK directly — just change the base URL:\n\n**Python:**\n\n```python\nfrom openai import OpenAI\n\nclient = OpenAI(\n    api_key=\"sk-ferro-...\",\n    base_url=\"http://localhost:8080/v1\",\n)\n```\n\n**TypeScript:**\n\n```typescript\nimport OpenAI from \"openai\";\n\nconst client = new OpenAI({\n  apiKey: \"sk-ferro-...\",\n  baseURL: \"http://localhost:8080/v1\",\n});\n```\n\n---\n\n## Contributing\n\nWe welcome contributions. New providers go in this OSS repo only — never in FerroCloud. See [CONTRIBUTING.md](CONTRIBUTING.md) for branch strategy, commit conventions, and PR guidelines.\n\n---\n\n## Community\n\n- [GitHub Discussions](https://github.com/ferro-labs/ai-gateway/discussions)\n- [Discord](https://discord.gg/yCAeYvJeDV)\n- Built with Ferro Labs AI Gateway? Open a PR to add to our showcase.\n\n---\n\n## License\n\nApache 2.0 — see [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fferro-labs%2Fai-gateway","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fferro-labs%2Fai-gateway","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fferro-labs%2Fai-gateway/lists"}