{"id":47724823,"url":"https://github.com/kianwoon/modelweaver","last_synced_at":"2026-04-15T06:05:18.206Z","repository":{"id":345377397,"uuid":"1185668713","full_name":"kianwoon/modelweaver","owner":"kianwoon","description":"Multi-provider model orchestration proxy for Claude Code. Route agent roles (planning, coding, research) to different LLM providers with automatic fallback, daemon mode, desktop GUI, config hot-reload, and crash recovery.","archived":false,"fork":false,"pushed_at":"2026-04-02T21:40:36.000Z","size":3968,"stargazers_count":2,"open_issues_count":5,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-03T05:56:00.240Z","etag":null,"topics":["ai-agents","anthropic","api-proxy","claude","claude-code","desktop-gui","developer-tools","fallback","hono","hot-reload","llm","llm-proxy","model-routing","multi-provider","openrouter","proxy","rate-limiting","sse","tauri","typescript"],"latest_commit_sha":null,"homepage":"https://github.com/kianwoon/modelweaver#readme","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kianwoon.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-18T20:31:20.000Z","updated_at":"2026-04-02T21:40:41.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/kianwoon/modelweaver","commit_stats":null,"previous_names":["kianwoon/modelweaver"],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/kianwoon/modelweaver","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kianwoon%2Fmodelweaver","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kianwoon%2Fmodelweaver/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kianwoon%2Fmodelweaver/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kianwoon%2Fmodelweaver/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kianwoon","download_url":"https://codeload.github.com/kianwoon/modelweaver/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kianwoon%2Fmodelweaver/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31450269,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T15:22:31.103Z","status":"ssl_error","status_checked_at":"2026-04-05T15:22:00.205Z","response_time":75,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","anthropic","api-proxy","claude","claude-code","desktop-gui","developer-tools","fallback","hono","hot-reload","llm","llm-proxy","model-routing","multi-provider","openrouter","proxy","rate-limiting","sse","tauri","typescript"],"created_at":"2026-04-02T20:12:56.282Z","updated_at":"2026-04-15T06:05:18.199Z","avatar_url":"https://github.com/kianwoon.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"gui/icons/icon.png\" alt=\"ModelWeaver\" width=\"96\" /\u003e\n\u003c/p\u003e\n\n# ModelWeaver\n\nMulti-provider LLM proxy for Claude Code. Route different agent roles to different model providers with automatic fallback, racing, circuit breakers, and a native desktop GUI.\n\n## 30-Second Setup\n\n```bash\nnpx @kianwoon/modelweaver init    # pick your provider, paste your API key\nnpx @kianwoon/modelweaver         # start the proxy\n\n# In another terminal — point Claude Code at ModelWeaver:\nexport ANTHROPIC_BASE_URL=http://localhost:3456\nexport ANTHROPIC_API_KEY=unused-but-required\nclaude\n```\n\nNo config file editing. No provider SDK installs. The wizard tests your API key and generates the config automatically.\n\n[Full setup guide](#quick-start) · [All CLI commands](#cli-commands) · [Configuration reference](#configuration)\n\n---\n\n[![CI](https://github.com/kianwoon/modelweaver/actions/workflows/ci.yml/badge.svg)](https://github.com/kianwoon/modelweaver/actions/workflows/ci.yml) [![CodeQL](https://github.com/kianwoon/modelweaver/actions/workflows/codeql.yml/badge.svg)](https://github.com/kianwoon/modelweaver/actions/workflows/codeql.yml) [![Release](https://github.com/kianwoon/modelweaver/actions/workflows/release.yml/badge.svg)](https://github.com/kianwoon/modelweaver/actions/workflows/release.yml) [![License: Apache-2.0](https://img.shields.io/badge/License-Apache--2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![npm](https://img.shields.io/npm/v/@kianwoon/modelweaver)](https://www.npmjs.com/package/@kianwoon/modelweaver) [![GitHub stars](https://img.shields.io/github/stars/kianwoon/modelweaver?style=social)](https://github.com/kianwoon/modelweaver/stargazers)\n\n\u003cimg width=\"376\" height=\"986\" alt=\"Screenshot 2026-04-06 at 7 31 49 PM\" src=\"https://github.com/user-attachments/assets/7aafc2e2-a358-4fec-bc08-478f68dc24fd\" /\u003e\n\n\n\n## What's New — v0.3.73\n\n- **Smart request routing** — classify message content by complexity and route to the appropriate model tier automatically (#97)\n- **Single-provider hedge skip** — hedging disabled for single-provider chains, prevents rate-limit amplification (#231)\n- **408/504 retry with fresh pool** — request timeout and server-unavailable now retry with a new connection pool (#231)\n- **Transient error detection** — detect and retry on transient errors in 400/413 response bodies (#230)\n- **GET /v1/models endpoint** — list available models from configured providers (#229)\n- **Retry-After header support** — respect provider rate-limit backoff for 429/503 responses (#228)\n- **Streaming-only token speed** — TTFB excluded from token-per-second calculations for accurate metrics (#227)\n- **Per-model connection pools** — each model gets its own HTTP/2 connection for TCP isolation (#186)\n- **GOAWAY-aware retry** — graceful HTTP/2 drain no longer marks pool as \"failed\" (#188)\n\n[View all releases](https://github.com/kianwoon/modelweaver/releases) · [Full changelog](CHANGELOG.md)\n\n---\n\n## How It Works\n\nModelWeaver sits between Claude Code and upstream model providers as a local HTTP proxy. It inspects the `model` field in each Anthropic Messages API request and routes it to the best-fit provider.\n\n```\nClaude Code  ──→  ModelWeaver  ──→  Anthropic (primary)\n                   (localhost)   ──→  OpenRouter (fallback)\n                   │\n              0. Classify message content → tier override? (smartRouting)\n              1. Match exact model name (modelRouting)\n              2. Match tier via substring (tierPatterns)\n              3. Fallback on 429 / 5xx errors\n              4. Race remaining providers on 429\n```\n\n## Features\n\n- **Smart request routing** — classify request complexity by message content (regex keyword scoring) and override the model tier automatically\n- **Tier-based routing** — route by model family (sonnet/opus/haiku) using substring pattern matching\n- **Exact model routing** — route specific model names to dedicated providers (checked first)\n- **Automatic fallback** — transparent failover on rate limits (429) and server errors (5xx)\n- **Adaptive racing** — on 429, automatically races remaining providers simultaneously\n- **Model name rewriting** — each provider in the chain can use a different model name\n- **Weighted distribution** — spread traffic across providers by weight percentage\n- **Circuit breaker** — per-provider circuit breaker with closed/open/half-open states, prevents hammering unhealthy providers\n- **Request hedging** — sends multiple copies when a provider shows high latency variance (CV \u003e 0.5), returns the fastest response (skipped for single-provider chains to avoid rate-limit amplification)\n- **TTFB timeout** — fails slow providers before full timeout elapses (configurable per provider)\n- **Stall detection** — detects stalled streams and aborts them, triggering fallback\n- **Connection pooling** — per-provider undici Agent dispatcher with configurable pool size\n- **Per-model connection pools** — isolate HTTP/2 connections per model via `modelPools` config for TCP-level isolation\n- **Connection retry** — automatic retry with exponential backoff for stale connections, TTFB timeouts, and GOAWAY drains\n- **Session agent pooling** — reuses HTTP/2 agents across requests within the same session for connection affinity\n- **Adaptive TTFB** — dynamically adjusts TTFB timeout based on observed latency history\n- **GOAWAY-aware retry** — graceful HTTP/2 GOAWAY drain no longer marks pool as \"failed\"\n- **Stream buffering** — optional time-based and size-based SSE buffering (`streamBufferMs`, `streamBufferBytes`)\n- **Health scores** — per-provider health scoring based on latency and error rates\n- **Provider error tracking** — per-provider error counts with status code breakdown, displayed in GUI in real-time\n- **Concurrent limits** — cap concurrent requests per provider\n- **Interactive setup wizard** — guided configuration with API key validation, hedging config, and provider editing\n- **Config hot-reload** — changes to config file are picked up automatically, no restart needed\n- **Daemon mode** — background process with auto-restart, launchd integration, and reload support\n- **Desktop GUI** — native Tauri app with real-time progress bars, provider health, error breakdown, and recent request history\n\n## Why ModelWeaver\n\n### Single-Provider Is a Hobby Setup\n\nRelying on one LLM provider is fine for experiments. For serious development, it's a liability. When your provider degrades — rate limits, slow tokens, stalled streams, outright outages — your entire coding session freezes. A 1-hour task becomes a 3-hour wait-and-retry loop.\n\nModelWeaver gives you **high availability for AI coding** — multiple providers, automatic failover, and intelligent traffic management. When one provider goes down, you don't even notice.\n\n**What happens without ModelWeaver:**\n```\n10:00  Coding session starts — everything's fast\n10:30  Token generation slows from 80 tok/s to 3 tok/s\n10:35  Stream stalls mid-response — you wait\n10:40  Retry — 429 rate limit — you wait more\n10:50  Another retry — 502 error — you give up\n11:30  Start over, lost context\nResult: 1-hour job took 4 hours\n```\n\n**What happens with ModelWeaver:**\n```\n10:00  Coding session starts — ModelWeaver routes to Provider A\n10:30  Provider A slows down — hedging detects high latency variance\n       → sends 2 copies simultaneously, returns the fastest\n10:35  Provider A stalls — stall detection aborts in seconds\n       → transparent fallback to Provider B, stream continues\n10:40  Provider A rate limits (429) — remaining providers race simultaneously\n       → recovery in \u003c2s, no context lost\nResult: 1-hour job took 1 hour\n```\n\n### How It Works\n\n| Problem | What ModelWeaver Does | Recovery Time |\n|---|---|---|\n| Provider slows down | Hedging sends 2-4 copies, returns fastest | Instant |\n| Stream stalls mid-response | Stall detection aborts, falls back to next provider | Seconds |\n| 429 rate limit | Races all remaining providers simultaneously | \u003c2s |\n| Provider goes down | Circuit breaker opens, traffic rerouted automatically | Seconds |\n| All providers unhealthy | Global backoff returns 503 immediately | Immediate |\n| Stale HTTP/2 connection | Transparent retry with exponential backoff | Transparent |\n| Provider returns errors | Health-score reordering deprioritizes bad providers | Rolling 5-min window |\n\n### Cost Optimization Without Quality Loss\n\nRunning everything through one provider at premium rates gets expensive. A full Claude Code session with multiple subagents generates 50-100+ API calls.\n\n- **Weighted routing with health blending**: Traffic automatically shifts toward healthier and cheaper providers when one degrades\n- **Tier-based routing**: Haiku-tier Explore agents (cheap, fast) never accidentally hit Opus-tier pricing. Sonnet coding agents don't burn expensive Opus tokens\n- **Model rewriting per provider**: The same `claude-sonnet-4-6` model name can route to different models on different providers — zero config changes in Claude Code\n\n### Operational Visibility\n\nWhen coding through a proxy, you're normally blind to why responses are slow or failing.\n\n- **Desktop GUI**: Real-time progress bars showing which provider handled each request, response time, and whether hedging fired\n- **Health scores API**: `curl /api/health-scores` shows per-provider scores (0-1). A score of 0.3 means the provider is failing ~50% of requests\n- **Error breakdown**: Per-provider error counts with status code breakdown — spot patterns like a provider returning 502s consistently\n- **Circuit breaker state**: See which providers are open/closed/half-open in real-time\n\n### Zero-Downtime Configuration\n\n- **Hot-reload (300ms debounce)**: Edit `config.yaml` and the daemon picks up changes automatically. No restart, no killed in-flight requests\n- **SIGHUP reload**: After rebuilding from source, `modelweaver reload` restarts the worker without killing the monitor\n\n## Prerequisites\n\n- **Node.js** 20 or later — [Install Node.js](https://nodejs.org)\n- `npx` — included with Node.js (no separate install needed)\n\n## Installation\n\nModelWeaver requires no permanent install — `npx` downloads and runs it on the fly. But if you prefer a global install:\n\n```bash\nnpm install -g @kianwoon/modelweaver\n```\n\nAfter that, replace `npx @kianwoon/modelweaver` with `modelweaver` (or the shorter `mw`) in all commands below.\n\n## Quick Start\n\n### 1. Run the setup wizard\n\n```bash\nnpx @kianwoon/modelweaver init\n```\n\nThe wizard guides you through:\n- Selecting from 6 preset providers (Anthropic, OpenRouter, Together AI, GLM/Z.ai, Minimax, Fireworks)\n- Testing API keys to verify connectivity\n- Setting up model routing tiers and hedging config\n- Creating `~/.modelweaver/config.yaml` and `~/.modelweaver/.env`\n\n### 2. Start ModelWeaver\n\n```bash\n# Foreground (see logs in terminal)\nnpx @kianwoon/modelweaver\n\n# Background daemon (auto-restarts on crash)\nnpx @kianwoon/modelweaver start\n\n# Install as launchd service (auto-start at login)\nnpx @kianwoon/modelweaver install\n```\n\n### 3. Point Claude Code to ModelWeaver\n\n```bash\nexport ANTHROPIC_BASE_URL=http://localhost:3456\nexport ANTHROPIC_API_KEY=unused-but-required\nclaude\n```\n\n## CLI Commands\n\n```bash\nnpx @kianwoon/modelweaver init              # Interactive setup wizard\nnpx @kianwoon/modelweaver start             # Start as background daemon\nnpx @kianwoon/modelweaver stop              # Stop background daemon\nnpx @kianwoon/modelweaver status            # Show daemon status + service state\nnpx @kianwoon/modelweaver remove            # Stop daemon + remove PID and log files\nnpx @kianwoon/modelweaver reload            # Reload daemon worker (after rebuild)\nnpx @kianwoon/modelweaver install           # Install launchd service (auto-start at login)\nnpx @kianwoon/modelweaver uninstall         # Uninstall launchd service\nnpx @kianwoon/modelweaver gui               # Launch desktop GUI (auto-downloads binary)\nnpx @kianwoon/modelweaver [options]         # Run in foreground\n```\n\n### CLI Options\n\n```\n  -p, --port \u003cnumber\u003e      Server port                    (default: from config)\n  -c, --config \u003cpath\u003e      Config file path               (auto-detected)\n  -v, --verbose            Enable debug logging           (default: off)\n  -h, --help               Show help\n```\n\n### Init Options\n\n```\n  --global                 Edit global config only\n  --path \u003cfile\u003e            Write config to a specific file\n```\n\n## Daemon Mode\n\nRun ModelWeaver as a background process that survives terminal closure and auto-recovers from crashes.\n\n```bash\nnpx @kianwoon/modelweaver start             # Start (forks monitor + daemon)\nnpx @kianwoon/modelweaver status            # Check if running\nnpx @kianwoon/modelweaver reload            # Reload worker after rebuild\nnpx @kianwoon/modelweaver stop              # Graceful stop (SIGTERM → SIGKILL after 5s)\nnpx @kianwoon/modelweaver remove            # Stop + remove PID file + log file\nnpx @kianwoon/modelweaver install           # Install launchd service\nnpx @kianwoon/modelweaver uninstall         # Uninstall launchd service\n```\n\n**How it works**: `start` forks a lightweight monitor process that owns the PID file. The monitor spawns the actual daemon worker. If the worker crashes, the monitor auto-restarts it with exponential backoff starting at 500ms (up to 10 attempts). After 60 seconds of stable running, the restart counter resets.\n\n```\nmodelweaver.pid        → Monitor process (handles signals, watches child)\n  └── modelweaver.worker.pid → Daemon worker (runs HTTP server)\n```\n\n**Files**:\n- `~/.modelweaver/modelweaver.pid` — monitor PID\n- `~/.modelweaver/modelweaver.worker.pid` — worker PID\n- `~/.modelweaver/modelweaver.log` — daemon output log\n\n## Desktop GUI\n\nModelWeaver ships a native desktop GUI built with Tauri. No Rust toolchain needed — the binary is auto-downloaded from GitHub Releases.\n\n```bash\nnpx @kianwoon/modelweaver gui\n```\n\nFirst run downloads the latest binary for your platform (~10-30 MB). Subsequent launches use the cached version.\n\n**GUI features:**\n- Real-time progress bars with provider name and model info\n- Provider health cards with error counts and status code breakdown\n- Recent request history sorted by timestamp\n- Config validation error banner\n- Auto-reconnect on daemon restart\n\n**Supported platforms:**\n\n| Platform | Format |\n|---|---|\n| macOS (Apple Silicon) | `.dmg` |\n| macOS (Intel) | `.dmg` |\n| Linux (x86_64) | `.AppImage` |\n| Windows (x86_64) | `.msi` |\n\n**Cached files** are stored in `~/.modelweaver/gui/` with version tracking — new versions download automatically on the next `gui` launch.\n\n## Configuration\n\n### Config file locations\n\nChecked in order (first found wins):\n1. `./modelweaver.yaml` (project-local)\n2. `~/.modelweaver/config.yaml` (user-global)\n\n### Full config schema\n\n```yaml\nserver:\n  port: 3456                  # Server port          (default: 3456)\n  host: localhost             # Bind address         (default: localhost)\n  streamBufferMs: 0           # Time-based stream flush threshold  (default: disabled)\n  streamBufferBytes: 0        # Size-based stream flush threshold  (default: disabled)\n  globalBackoffEnabled: true  # Global backoff on repeated failures (default: true)\n  unhealthyThreshold: 0.5     # Health score below which provider is unhealthy (default: 0.5, 0–1)\n  maxBodySizeMB: 10           # Max request body size in MB        (default: 10, 1–100)\n  sessionIdleTtlMs: 600000    # Session agent pool idle TTL in ms  (default: 600000 / 10min, min: 60000)\n  disableThinking: false      # Strip thinking blocks from requests (default: false)\n\n# Adaptive request hedging\nhedging:\n  speculativeDelay: 500       # ms before starting backup providers  (default: 500)\n  cvThreshold: 0.5            # latency CV threshold for hedging    (default: 0.5)\n  maxHedge: 4                 # max concurrent copies per request    (default: 4)\n\nproviders:\n  anthropic:\n    baseUrl: https://api.anthropic.com\n    apiKey: ${ANTHROPIC_API_KEY}  # Env var substitution\n    timeout: 20000                # Request timeout in ms  (default: 20000)\n    ttfbTimeout: 8000             # TTFB timeout in ms     (default: 8000)\n    stallTimeout: 15000           # Stall detection timeout (default: 15000)\n    poolSize: 10                  # Connection pool size   (default: 10)\n    concurrentLimit: 10           # Max concurrent requests (default: unlimited)\n    connectionRetries: 3          # Retries for stale connections (default: 3, max: 10)\n    staleAgentThresholdMs: 30000  # Mark pooled agent stale after idle ms (optional)\n    rateLimitBackoffMs: 2000      # Backoff after 429/503 in ms (optional, overrides Retry-After)\n    retryableErrorPatterns:       # Regex patterns for retryable error messages (optional)\n      - \"network error\"\n      - \"system error\"\n    modelPools:                   # Per-model pool size overrides (optional)\n      \"claude-sonnet-4-20250514\": 20\n    modelLimits:                  # Per-provider token limits (optional)\n      maxOutputTokens: 16384\n    authType: anthropic           # \"anthropic\" | \"bearer\"  (default: anthropic)\n    circuitBreaker:               # Per-provider circuit breaker (optional)\n      failureThreshold: 3         # Failures before opening circuit (alias: threshold, default: 3)\n      windowSeconds: 60           # Time window for failure count  (default: 60)\n      cooldownSeconds: 30         # Cooldown in seconds (alias: cooldown, also in seconds, default: 30)\n      rateLimitCooldownSeconds: 10  # Shorter cooldown for 429 rate limits (optional)\n  openrouter:\n    baseUrl: https://openrouter.ai/api\n    apiKey: ${OPENROUTER_API_KEY}\n    authType: bearer\n    timeout: 60000\n\n# Exact model name routing (checked FIRST, before tier patterns)\nmodelRouting:\n  \"glm-5-turbo\":\n    - provider: anthropic\n  \"MiniMax-M2.7\":\n    - provider: openrouter\n      model: minimax/MiniMax-M2.7        # With model name rewrite\n  # Weighted distribution example:\n  # \"claude-sonnet-4\":\n  #   - provider: anthropic\n  #     weight: 70\n  #   - provider: openrouter\n  #     weight: 30\n\n# Tier-based routing (fallback chain)\nrouting:\n  sonnet:\n    - provider: anthropic\n      model: claude-sonnet-4-20250514      # Optional: rewrite model name\n    - provider: openrouter\n      model: anthropic/claude-sonnet-4      # Fallback\n  opus:\n    - provider: anthropic\n      model: claude-opus-4-20250514\n  haiku:\n    - provider: anthropic\n      model: claude-haiku-4-5-20251001\n\n# Pattern matching: model name includes any string → matched to tier\ntierPatterns:\n  sonnet: [\"sonnet\", \"3-5-sonnet\", \"3.5-sonnet\"]\n  opus: [\"opus\", \"3-opus\", \"3.5-opus\"]\n  haiku: [\"haiku\", \"3-haiku\", \"3.5-haiku\"]\n\n# Smart request routing — classify message content and override model tier\n# When enabled, analyzes the last user message against regex patterns.\n# If cumulative score \u003e= escalationThreshold, routes to the classified tier\n# instead of the model requested. Disabled by default.\n# smartRouting:\n#   enabled: true\n#   escalationThreshold: 2    # minimum score to trigger tier override\n#   patterns:\n#     \"1\":                     # Tier 1 — best model (e.g., opus-tier)\n#       - pattern: \"architect|design system|from scratch\"\n#         score: 3\n#       - pattern: \"debug|troubleshoot|investigate|root cause\"\n#         score: 2\n#     \"2\":                     # Tier 2 — good model (e.g., sonnet-tier)\n#       - pattern: \"explain|summarize|compare\"\n#         score: 2\n#       - pattern: \"write.*test|refactor|review\"\n#         score: 2\n# Requires matching routing entries: routing.tier1, routing.tier2 (tier3 optional)\n# Graceful degradation: if classified tier has no providers, tries next tier down\n```\n\n### Routing priority\n\n1. **Smart content routing** (`smartRouting`) — if enabled and message content matches classification patterns, override to the classified tier (bypasses all other routing)\n2. **Exact model name** (`modelRouting`) — if the request model matches exactly, use that route\n3. **Weighted distribution** — if the model has `weight` entries, requests are distributed across providers proportionally\n4. **Tier pattern** (`tierPatterns` + `routing`) — substring match the model name against patterns, then use the tier's provider chain\n5. **No match** — returns 502 with a descriptive error listing configured tiers and model routes\n\n### Provider chain behavior\n\n- **First provider is primary**, rest are fallbacks\n- **Fallback triggers** on: 429 (rate limit), 5xx (server error), network timeout, stream stall\n- **Adaptive race mode** — when a 429 is received, remaining providers are raced simultaneously (not sequentially) for faster recovery\n- **Circuit breaker** — providers that repeatedly fail are temporarily skipped (auto-recovers after cooldown, configurable window)\n- **Hedging skip** — single-provider chains skip hedging entirely (multi-copy to one provider amplifies rate limits without improving outcome)\n- **No fallback on**: 4xx (bad request, auth failure, forbidden) — returned immediately (except 429 and transient errors in 400/413 bodies)\n- **Model rewriting**: each provider entry can override the `model` field in the request body\n\n### Supported providers\n\n| Provider | Auth Type | Base URL |\n|---|---|---|\n| Anthropic | `x-api-key` | `https://api.anthropic.com` |\n| OpenRouter | Bearer | `https://openrouter.ai/api` |\n| Together AI | Bearer | `https://api.together.xyz` |\n| GLM (Z.ai) | `x-api-key` | `https://api.z.ai/api/anthropic` |\n| Minimax | `x-api-key` | `https://api.minimax.io/anthropic` |\n| Fireworks | Bearer | `https://api.fireworks.ai/inference/v1` |\n\nAny OpenAI/Anthropic-compatible API works — just set `baseUrl` and `authType` appropriately.\n\n### Config hot-reload\n\nIn daemon mode, ModelWeaver watches the config file for changes and reloads automatically (debounced 300ms). You can also send a manual reload signal:\n\n```bash\nkill -SIGHUP $(cat ~/.modelweaver/modelweaver.pid)\n```\n\nOr use the CLI:\n\n```bash\nnpx @kianwoon/modelweaver reload\n```\n\nRe-running `npx @kianwoon/modelweaver init` also signals the running daemon to reload.\n\n## API\n\n### Health check\n\n```bash\ncurl http://localhost:3456/api/status\n```\n\nReturns circuit breaker state for all providers and server uptime.\n\n### Version\n\n```bash\ncurl http://localhost:3456/api/version\n```\n\nReturns the running ModelWeaver version.\n\n### Available models\n\n```bash\ncurl http://localhost:3456/v1/models\n```\n\nReturns the list of available models from configured providers (Anthropic-compatible format).\n\n### Connection pool status\n\n```bash\ncurl http://localhost:3456/api/pool\n```\n\nReturns active connection pool state for all providers.\n\n### Health scores\n\n```bash\ncurl http://localhost:3456/api/health-scores\n```\n\nReturns per-provider health scores based on latency and error rates.\n\n### Session pool status\n\n```bash\ncurl http://localhost:3456/api/sessions\n```\n\nReturns session agent pool statistics.\n\n## Observability\n\n```bash\n# Aggregated request metrics (by model, provider, error type)\ncurl http://localhost:3456/api/metrics/summary\n\n# Per-provider circuit breaker state\ncurl http://localhost:3456/api/circuit-breaker\n\n# Hedging win/loss statistics\ncurl http://localhost:3456/api/hedging/stats\n```\n\n## How Claude Code Uses Model Tiers\n\nClaude Code sends different model names for different agent roles:\n\n| Agent Role | Model Tier | Typical Model Name |\n|---|---|---|\n| Main conversation, coding | Sonnet | `claude-sonnet-4-20250514` |\n| Explore (codebase search) | Haiku | `claude-haiku-4-5-20251001` |\n| Plan (analysis) | Sonnet | `claude-sonnet-4-20250514` |\n| Complex subagents | Opus | `claude-opus-4-20250514` |\n| GLM/Z.ai models | Exact routing | `glm-5-turbo` |\n| MiniMax models | Exact routing | `MiniMax-M2.7` |\n\nModelWeaver uses the model name to determine which agent tier is calling, then routes accordingly.\n\n## Development\n\n```bash\nnpm install          # Install dependencies\nnpm test             # Run tests (307 tests)\nnpm run build        # Build for production (tsup)\nnpm run dev          # Run in dev mode (tsx)\n```\n\n## FAQ\n\n**Why do I need `ANTHROPIC_API_KEY=unused-but-required`?**\n\nClaude Code validates that `ANTHROPIC_API_KEY` is set before connecting. ModelWeaver handles real auth to upstream providers — the env var just satisfies Claude Code's startup check.\n\n**Port 3456 is already in use.**\n\nSomething else is running on that port. Either stop it, or set a different port in your config:\n\n```yaml\nserver:\n  port: 8080\n```\n\nThen update `ANTHROPIC_BASE_URL` to match.\n\n**How do I know ModelWeaver is running?**\n\n```bash\ncurl http://localhost:3456/api/status\n```\n\nReturns JSON with uptime and circuit breaker state. Or check the GUI:\n\n```bash\nnpx @kianwoon/modelweaver gui\n```\n\n**How do I switch providers?**\n\nRun `npx @kianwoon/modelweaver init` again — it opens your existing config for editing. Or edit `~/.modelweaver/config.yaml` directly (hot-reloaded automatically in daemon mode).\n\n## License\n\nApache-2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkianwoon%2Fmodelweaver","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkianwoon%2Fmodelweaver","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkianwoon%2Fmodelweaver/lists"}