{"id":47982520,"url":"https://github.com/ericflo/modelrelay","last_synced_at":"2026-04-18T09:02:56.473Z","repository":{"id":348787730,"uuid":"1199884660","full_name":"ericflo/modelrelay","owner":"ericflo","description":"Central HTTP LLM proxy that routes inference requests to authenticated remote workers over WebSocket — queueing, streaming, and cancellation included.","archived":false,"fork":false,"pushed_at":"2026-04-09T05:02:20.000Z","size":2127,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-09T05:03:09.945Z","etag":null,"topics":["anthropic-compatible","gpu","inference","llama","llm","openai-compatible","proxy","rust","websocket","worker"],"latest_commit_sha":null,"homepage":"https://ericflo.github.io/modelrelay/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ericflo.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-02T20:11:00.000Z","updated_at":"2026-04-09T04:11:35.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ericflo/modelrelay","commit_stats":null,"previous_names":["ericflo/llm-worker-proxy","ericflo/modelrelay"],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/ericflo/modelrelay","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericflo%2Fmodelrelay","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericflo%2Fmodelrelay/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericflo%2Fmodelrelay/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericflo%2Fmodelrelay/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ericflo","download_url":"https://codeload.github.com/ericflo/modelrelay/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericflo%2Fmodelrelay/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31875183,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-15T15:24:51.572Z","status":"online","status_checked_at":"2026-04-16T02:00:06.042Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anthropic-compatible","gpu","inference","llama","llm","openai-compatible","proxy","rust","websocket","worker"],"created_at":"2026-04-04T11:14:28.948Z","updated_at":"2026-04-16T07:00:55.383Z","avatar_url":"https://github.com/ericflo.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![CI](https://github.com/ericflo/modelrelay/actions/workflows/ci.yml/badge.svg)](https://github.com/ericflo/modelrelay/actions/workflows/ci.yml)\n[![Latest Release](https://img.shields.io/github/v/release/ericflo/modelrelay)](https://github.com/ericflo/modelrelay/releases/latest)\n[![Coverage](https://codecov.io/gh/ericflo/modelrelay/branch/main/graph/badge.svg)](https://codecov.io/gh/ericflo/modelrelay)\n[![crates.io](https://img.shields.io/crates/v/modelrelay-protocol)](https://crates.io/crates/modelrelay-protocol)\n[![Minimum Rust Version](https://img.shields.io/badge/rustc-1.94+-orange.svg)](rust-toolchain.toml)\n[![Documentation](https://img.shields.io/badge/docs-GitHub%20Pages-blue)](https://ericflo.github.io/modelrelay/)\n[![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n\n# ModelRelay\n\n**Stop configuring clients for every GPU box. Workers connect out; requests route in.**\n\nYou have GPU boxes running `llama-server` (or Ollama, or vLLM, or anything OpenAI-compatible). Today you either expose each one directly — port forwarding, DNS, firewall rules — or you stick a load balancer in front that doesn't understand LLM streaming or cancellation.\n\nModelRelay flips the model: a central proxy receives standard inference requests while worker daemons on your GPU boxes connect *out* to it over WebSocket. The proxy handles queueing, routing, streaming pass-through, and cancellation propagation. Clients see one stable endpoint and never need to know about your hardware.\n\n```\n  Clients (curl, Claude Code, LiteLLM, Open WebUI, ...)\n         │\n         │  POST /v1/chat/completions\n         │  POST /v1/messages\n         ▼\n  ┌──────────────────────┐\n  │   modelrelay-server  │◄─── workers connect out (WebSocket)\n  │   (one stable        │     no inbound ports needed on GPU boxes\n  │    endpoint)         │\n  └──────────────────────┘\n         │  routes request to best available worker\n         ▼\n  ┌────────┐  ┌────────┐  ┌────────┐\n  │worker-1│  │worker-2│  │worker-3│\n  │ llama  │  │ ollama │  │ vllm   │  ← your GPU boxes,\n  │ server │  │        │  │        │    anywhere on any network\n  └────────┘  └────────┘  └────────┘\n```\n\n## Desktop App\n\nModelRelay Desktop is a native tray application that wraps the worker daemon in a lightweight GUI. It stays in your system tray and manages the connection to your relay server — no terminal required.\n\n**Features:**\n- System tray icon showing connection status (connected / disconnected / relaying)\n- Settings UI for backend URL, relay server, worker secret, model selection, and poll interval\n- Auto-reconnect on connection loss with status notifications\n- Auto-start on login\n- Live model list that refreshes as your backend models change\n\n**Download:** Grab the latest installer for your platform from the [Desktop Releases](https://github.com/ericflo/modelrelay/releases?q=desktop) page.\n\n| Platform | Installer |\n|----------|-----------|\n| Windows | `.msi` or `.exe` |\n| macOS | `.dmg` |\n| Linux | `.AppImage` or `.deb` |\n\n**Getting started:**\n1. Download and install the app for your platform\n2. Launch ModelRelay Desktop — it appears in your system tray\n3. Right-click the tray icon and open **Settings**\n4. Enter your backend URL (e.g. `http://127.0.0.1:8000`), relay server URL, and worker secret\n5. Click **Connect** — the tray icon updates to show your connection status\n\nThe desktop app uses the same `modelrelay-worker` library under the hood, so it supports all the same backends (llama-server, Ollama, vLLM, LM Studio, etc.).\n\n## Who is this for?\n\n- **Home GPU users** running local models who want a single API endpoint across multiple machines\n- **Teams with on-prem hardware** that need to pool GPU capacity without a service mesh\n- **Researchers** juggling models across heterogeneous boxes who are tired of updating client configs\n\n## Why this instead of...\n\n| Alternative | What's missing |\n|---|---|\n| **Pointing clients directly at llama-server** | No HA, no queue, clients must know about every box, no cancellation |\n| **nginx / HAProxy** | Doesn't understand LLM streaming semantics, no queueing, no worker auth, no cancellation propagation |\n| **LiteLLM / OpenRouter** | Cloud-first routing — not designed for your own private hardware calling home |\n\n## Hosted Version\n\nDon't want to run the infrastructure yourself? A fully-managed hosted version is available at [modelrelay.io](https://modelrelay.io) — no server setup, no infrastructure to manage. Just get an API key, point your workers at it, and start routing requests. Same open protocol, zero ops burden.\n\n## Quickstart\n\n### Pre-built binaries (recommended)\n\nPre-built binaries are the fastest way to get started. Download the latest release for your platform from the [Releases page](https://github.com/ericflo/modelrelay/releases):\n\n| Platform | modelrelay-server | modelrelay-worker |\n|----------|-------------------|-------------------|\n| Linux x86_64 | `modelrelay-server-linux-amd64` | `modelrelay-worker-linux-amd64` |\n| Linux arm64 | `modelrelay-server-linux-arm64` | `modelrelay-worker-linux-arm64` |\n| macOS Intel | `modelrelay-server-darwin-amd64` | `modelrelay-worker-darwin-amd64` |\n| macOS Apple Silicon | `modelrelay-server-darwin-arm64` | `modelrelay-worker-darwin-arm64` |\n| Windows x86_64 | `modelrelay-server-windows-amd64.exe` | `modelrelay-worker-windows-amd64.exe` |\n| Windows arm64 | `modelrelay-server-windows-arm64.exe` | `modelrelay-worker-windows-arm64.exe` |\n\n**Start the proxy:**\n\n```bash\n./modelrelay-server \\\n  --listen 0.0.0.0:8080 \\\n  --worker-secret mysecret\n```\n\n**Start a worker** (on a GPU box with `llama-server`, Ollama, vLLM, or any OpenAI-compatible backend):\n\n```bash\n./modelrelay-worker \\\n  --proxy-url http://\u003cproxy-host\u003e:8080 \\\n  --worker-secret mysecret \\\n  --backend-url http://127.0.0.1:8000 \\\n  --models llama3.2:3b,llama3.2:1b\n```\n\n### Docker\n\nPre-built images are published to GitHub Container Registry on every release and main push.\n\n```bash\n# Pull the latest images\ndocker pull ghcr.io/ericflo/modelrelay/modelrelay-server:latest\ndocker pull ghcr.io/ericflo/modelrelay/modelrelay-worker:latest\n\n# Run the proxy\ndocker run -p 8080:8080 \\\n  -e WORKER_SECRET=mysecret \\\n  -e LISTEN_ADDR=0.0.0.0:8080 \\\n  ghcr.io/ericflo/modelrelay/modelrelay-server:latest\n\n# Run a worker (on a GPU box)\ndocker run \\\n  -e PROXY_URL=http://\u003cproxy-host\u003e:8080 \\\n  -e WORKER_SECRET=mysecret \\\n  -e BACKEND_URL=http://host.docker.internal:8000 \\\n  -e MODELS=llama3.2:3b \\\n  ghcr.io/ericflo/modelrelay/modelrelay-worker:latest\n```\n\nFor pinned versions, replace `:latest` with a release tag (e.g. `:0.2.1`).\n\n### Docker Compose (easiest for local dev)\n\n```bash\ngit clone https://github.com/ericflo/modelrelay.git\ncd modelrelay\n\n# Start the proxy + one worker (assumes llama-server on host port 8081)\ndocker compose up\n```\n\nThe proxy is now listening on `http://localhost:8080`. The worker connects to it automatically and forwards requests to your backend.\n\n### From crates.io\n\n\u003e **Note:** The crates are not yet published to crates.io. Use [pre-built binaries](#pre-built-binaries-recommended) or [Docker](#docker) in the meantime. See [CONTRIBUTING.md](CONTRIBUTING.md#ci-secrets) for how to configure the `CRATES_IO_TOKEN` secret for publishing.\n\n```bash\ncargo install modelrelay-server modelrelay-worker\n```\n\n### Build from source\n\n```bash\ncargo build --release\n# Binaries: target/release/modelrelay-server  target/release/modelrelay-worker\n```\n\n### Try it\n\n```bash\n# Non-streaming\ncurl http://localhost:8080/v1/chat/completions \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n    \"model\": \"llama3.2:3b\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"Hello!\"}],\n    \"stream\": false\n  }'\n\n# Streaming (SSE chunks pass through from the backend)\ncurl http://localhost:8080/v1/chat/completions \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n    \"model\": \"llama3.2:3b\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"Hello!\"}],\n    \"stream\": true\n  }'\n```\n\n## Connecting your tools\n\nOnce the proxy is running, point your existing tools at it — no special client needed.\n\n**curl** — see [Try it](#try-it) above.\n\n**Claude Code / Claude Desktop** — set the base URL to your proxy:\n\n```bash\nexport ANTHROPIC_BASE_URL=http://localhost:8080\nclaude    # requests now route through ModelRelay\n```\n\n**LiteLLM** — add a model entry in your `config.yaml`:\n\n```yaml\nmodel_list:\n  - model_name: llama3.2:3b\n    litellm_params:\n      model: openai/llama3.2:3b\n      api_base: http://localhost:8080/v1\n```\n\n**Open WebUI** — point the OpenAI-compatible backend at the proxy:\n\n```bash\nexport OPENAI_API_BASE_URL=http://localhost:8080/v1\n```\n\nAny tool that speaks OpenAI or Anthropic API formats works — just change the base URL.\n\n## Make it persistent\n\nOnce your worker is running, set it up as a system service so it starts automatically on boot:\n\n- **Linux (systemd):** Use the template unit in [`extras/modelrelay-worker@.service`](extras/modelrelay-worker@.service) — supports multiple workers per machine (`modelrelay-worker@gpu0`, `@gpu1`, etc.). See [Systemd](#systemd-bare-metal--vm) below for full instructions.\n- **macOS (launchd):** Create a Launch Daemon plist pointing at the binary and your `config.toml`. The worker starts on boot and restarts on crash.\n- **Windows (Service):** Register with `sc.exe create` and set env vars with `[Environment]::SetEnvironmentVariable`. See [Windows Service](#windows-service) below for full instructions.\n\nThe setup wizard at `/setup` in the web UI walks through this interactively with copy-paste commands.\n\n## llamafile Integration\n\nThe `extras/modelrelay-llamafile` script is a self-contained CLI for downloading, running, and relaying [llamafile](https://mozilla-ai.github.io/llamafile/) models through ModelRelay. No dependencies beyond bash and curl.\n\n```bash\n# See what fits your hardware\n./extras/modelrelay-llamafile recommend\n\n# Browse models by category\n./extras/modelrelay-llamafile list --tag reasoning\n\n# Save your relay config once\n./extras/modelrelay-llamafile config set proxy-url https://relay.example.com\n./extras/modelrelay-llamafile config set worker-secret mysecret\n\n# Now just serve — no flags needed\n./extras/modelrelay-llamafile serve qwen3.5-4b\n\n# Verify it works end-to-end\n./extras/modelrelay-llamafile test qwen3.5-4b\n\n# Manage running models\n./extras/modelrelay-llamafile status\n./extras/modelrelay-llamafile logs qwen3.5-4b -f\n./extras/modelrelay-llamafile stop all\n\n# Import your own llamafiles\n./extras/modelrelay-llamafile import ./my-model.llamafile --slug my-model\n\n# Refresh catalog when Mozilla publishes new models\n./extras/modelrelay-llamafile update-catalog\n```\n\nRun `./extras/modelrelay-llamafile help` for full usage, or `./extras/modelrelay-llamafile doctor` to check system readiness.\n\n## Features\n\n- **Cross-platform** — pre-built binaries for Linux, macOS, and Windows (x86_64 + arm64)\n- **OpenAI + Anthropic compatible** — `POST /v1/chat/completions`, `POST /v1/responses`, `POST /v1/messages`, `GET /v1/models`\n- **No inbound ports on GPU boxes** — workers connect out to the proxy over WebSocket\n- **Request queueing** — configurable depth and timeout when all workers are busy\n- **Streaming pass-through** — SSE chunks forwarded with preserved ordering and termination\n- **End-to-end cancellation** — client disconnect propagates through the proxy to the worker to the backend\n- **Automatic requeue** — if a worker dies mid-request, the request is requeued to another worker\n- **Heartbeat and load tracking** — stale workers are cleaned up; workers report current load\n- **Graceful drain** — workers can shut down while replacement workers pick up queued work\n- **Model catalog refresh** — workers can update their model list without reconnecting\n- **Auth cooldown recovery** — workers recover gracefully from authentication failures\n\n## Configuration\n\n### modelrelay-server\n\n| Flag | Env var | Default | Description |\n|------|---------|---------|-------------|\n| `--listen` | `LISTEN_ADDR` | `127.0.0.1:8080` | Address to listen on |\n| `--worker-secret` | `WORKER_SECRET` | *(required)* | Secret workers must present to authenticate |\n| `--provider` | `PROVIDER_NAME` | `local` | Provider name used for worker routing and request dispatch |\n| `--max-queue-len` | `MAX_QUEUE_LEN` | `100` | Maximum number of queued requests (0 = unlimited) |\n| `--queue-timeout` | `QUEUE_TIMEOUT_SECS` | `30` | Seconds before a queued request times out (0 = no timeout) |\n| `--request-timeout` | `REQUEST_TIMEOUT_SECS` | `300` | Seconds before an in-flight HTTP request times out (0 = no timeout) |\n| `--log-level` | `LOG_LEVEL` | `info` | Log level filter (e.g. `info`, `debug`, or `modelrelay_server=debug`). Overridden by `RUST_LOG` if set. |\n| `--admin-token` | `MODELRELAY_ADMIN_TOKEN` | *(none)* | Bearer token for `/admin/*` endpoints. If unset, admin endpoints return 403. |\n| `--require-api-keys` | `MODELRELAY_REQUIRE_API_KEYS` | `false` | When `true`, client inference requests must include a valid API key as Bearer token. |\n\n### modelrelay-worker\n\n| Flag | Env var | Default | Description |\n|------|---------|---------|-------------|\n| `--proxy-url` | `PROXY_URL` | `http://127.0.0.1:8080` | Base URL of the proxy server |\n| `--worker-secret` | `WORKER_SECRET` | *(required)* | Secret used to authenticate with the proxy |\n| `--backend-url` | `BACKEND_URL` | `http://127.0.0.1:8000` | Base URL of the local model backend |\n| `--models` | `MODELS` | `default` | Comma-separated list of model names this worker supports |\n| `--provider` | `PROVIDER_NAME` | `local` | Provider name to register with on the proxy |\n| `--worker-name` | `WORKER_NAME` | `worker` | Human-readable name for this worker instance |\n| `--max-concurrency` | `MAX_CONCURRENCY` | `1` | Maximum number of concurrent requests this worker will handle |\n| `--log-level` | `LOG_LEVEL` | `info` | Log level filter (e.g. `info`, `debug`, or `modelrelay_worker=debug`). Overridden by `RUST_LOG` if set. |\n\nAll flags can be passed as CLI arguments or set via the corresponding environment variable.\n\n## Admin API \u0026 Web Dashboard\n\nModelRelay includes built-in admin endpoints for monitoring and an embedded web dashboard for managing your deployment.\n\n### Admin API Endpoints\n\n| Method | Path | Auth | Description |\n|--------|------|------|-------------|\n| GET | `/health` | None | Basic health check — returns version, worker count, queue depth, and uptime |\n| GET | `/admin/workers` | Admin token | List connected workers with models, load, and capabilities |\n| GET | `/admin/stats` | Admin token | Request counts, queue depth per provider |\n| GET | `/admin/keys` | Admin token | List client API key metadata (no secrets) |\n| POST | `/admin/keys` | Admin token | Create a new client API key — returns the secret once |\n| DELETE | `/admin/keys/{id}` | Admin token | Revoke a client API key |\n\n### Admin Authentication\n\nAll `/admin/*` endpoints require a Bearer token matching `MODELRELAY_ADMIN_TOKEN`:\n\n```bash\n# Set the admin token when starting the server\nmodelrelay-server --worker-secret mysecret --admin-token my-admin-secret\n\n# Query admin endpoints\ncurl -H \"Authorization: Bearer my-admin-secret\" http://localhost:8080/admin/workers\ncurl -H \"Authorization: Bearer my-admin-secret\" http://localhost:8080/admin/stats\n```\n\nIf `MODELRELAY_ADMIN_TOKEN` is not set, all admin endpoints return `403 Forbidden`.\n\n### Client API Key Authentication\n\nWhen `MODELRELAY_REQUIRE_API_KEYS` is set to `true`, clients must include a valid API key as a Bearer token on inference requests (`/v1/chat/completions`, `/v1/messages`, etc.). Without a valid key, requests are rejected with `401 Unauthorized`.\n\n```bash\n# Start the server with API key auth enabled\nmodelrelay-server --worker-secret mysecret --admin-token my-admin-secret --require-api-keys true\n\n# Create a client API key (the secret is returned only once)\ncurl -X POST -H \"Authorization: Bearer my-admin-secret\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"name\": \"my-app\"}' \\\n  http://localhost:8080/admin/keys\n\n# Use the key for inference\ncurl -H \"Authorization: Bearer mr-...\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"llama3.2:3b\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}]}' \\\n  http://localhost:8080/v1/chat/completions\n\n# Revoke a key\ncurl -X DELETE -H \"Authorization: Bearer my-admin-secret\" \\\n  http://localhost:8080/admin/keys/{key-id}\n```\n\nWhen `MODELRELAY_REQUIRE_API_KEYS` is `false` (the default), inference endpoints accept requests without any authentication.\n\n### Web Dashboard \u0026 Setup Wizard\n\nThe `modelrelay-web` crate provides an embedded web UI served by the proxy:\n\n- **Dashboard** at `/dashboard` — real-time view of connected workers, request metrics, and queue depth\n- **Setup Wizard** at `/setup` — step-by-step guide for connecting new workers (platform detection, backend configuration, worker binary download, and live connection verification)\n\nThe setup wizard is always accessible — not just on first run. Use it to add additional GPU boxes to your fleet at any time.\n\n## Production deployment\n\n### Docker Compose (multi-worker)\n\nThe included [`docker-compose.yml`](docker-compose.yml) runs the proxy with two workers, health checks, restart policies, memory limits, and log rotation:\n\n```bash\ncp .env.example .env   # edit WORKER_SECRET and backend URLs\ndocker compose up -d\n```\n\nAdd more workers by duplicating a worker service block and adjusting `MODELS`, `BACKEND_URL`, and `WORKER_NAME`.\n\n### Systemd (bare metal / VM)\n\nService files live in [`extras/`](extras/):\n\n```bash\n# Install binaries (from a release archive or cargo build --release)\nsudo install -m 755 modelrelay-server modelrelay-worker /usr/local/bin/\n\n# Create a service user\nsudo useradd --system --no-create-home modelrelay\nsudo mkdir -p /var/lib/modelrelay /etc/modelrelay\n\n# Proxy\nsudo cp extras/modelrelay-server.service /etc/systemd/system/\nsudo cp extras/proxy.env.example /etc/modelrelay/proxy.env\nsudo vim /etc/modelrelay/proxy.env   # set WORKER_SECRET\nsudo systemctl enable --now modelrelay-server\n\n# Workers — the template unit lets you run multiple instances:\nsudo cp extras/modelrelay-worker@.service /etc/systemd/system/\nsudo cp extras/worker.env.example /etc/modelrelay/worker-gpu0.env\nsudo vim /etc/modelrelay/worker-gpu0.env   # set PROXY_URL, BACKEND_URL, MODELS\nsudo systemctl enable --now modelrelay-worker@gpu0\n```\n\nSee [`extras/`](extras/) for the full service files and annotated env examples.\n\n### Windows Service\n\nModelRelay ships Windows binaries that can run as native Windows Services using `sc.exe`. No third-party service wrappers required.\n\n```powershell\n# Install the server as a service (run as Administrator)\nsc.exe create ModelRelayServer binPath= \"C:\\ModelRelay\\modelrelay-server.exe\" start= auto\n\n# Set environment variables for the service (system-wide, persists across reboots)\n[Environment]::SetEnvironmentVariable(\"WORKER_SECRET\", \"your-secret-here\", \"Machine\")\n[Environment]::SetEnvironmentVariable(\"LISTEN_ADDR\", \"0.0.0.0:8080\", \"Machine\")\n\n# Start the service\nStart-Service ModelRelayServer\n\n# Install a worker service\nsc.exe create ModelRelayWorker binPath= '\"C:\\ModelRelay\\modelrelay-worker.exe\" --models llama3-8b' start= auto\n[Environment]::SetEnvironmentVariable(\"PROXY_URL\", \"http://your-proxy:8080\", \"Machine\")\n[Environment]::SetEnvironmentVariable(\"BACKEND_URL\", \"http://localhost:8000\", \"Machine\")\nStart-Service ModelRelayWorker\n```\n\nFor fully annotated install scripts with error handling and uninstall support, see [`extras/install-windows-service.ps1`](extras/install-windows-service.ps1) and [`extras/install-windows-service-worker.ps1`](extras/install-windows-service-worker.ps1). The service runs as `LocalSystem` by default; to use a dedicated account, set the service log-on via `services.msc` or pass `obj=` and `password=` to `sc.exe create`.\n\n### TLS\n\nThe proxy and workers communicate over plain HTTP/WebSocket by default. For production, terminate TLS at a reverse proxy like nginx. An annotated configuration is provided at [`examples/tls-nginx.conf`](examples/tls-nginx.conf) — it handles HTTPS for client requests and `wss://` WebSocket upgrades for workers, with streaming-friendly settings (buffering disabled, long timeouts).\n\n### Load Testing\n\nA ready-made load test script lives at [`extras/load-test.sh`](extras/load-test.sh). It uses `hey` if installed, falls back to `wrk`, and finally to parallel `curl` loops:\n\n```bash\n./extras/load-test.sh -n 200 -c 20 -m llama3-8b\n```\n\n### Shell Completions\n\nBoth `modelrelay-server` and `modelrelay-worker` can generate shell completion scripts via the hidden `--completions` flag:\n\n```bash\n# Bash\nmodelrelay-server --completions bash \u003e ~/.local/share/bash-completion/completions/modelrelay-server\nmodelrelay-worker --completions bash \u003e ~/.local/share/bash-completion/completions/modelrelay-worker\n\n# Zsh (add the target directory to $fpath)\nmodelrelay-server --completions zsh \u003e ~/.zfunc/_modelrelay-server\nmodelrelay-worker --completions zsh \u003e ~/.zfunc/_modelrelay-worker\n\n# Fish\nmodelrelay-server --completions fish \u003e ~/.config/fish/completions/modelrelay-server.fish\nmodelrelay-worker --completions fish \u003e ~/.config/fish/completions/modelrelay-worker.fish\n```\n\nSupported shells: `bash`, `zsh`, `fish`, `powershell`, `elvish`.\n\n## Documents\n\n\u003e **Full documentation:** [ericflo.github.io/modelrelay](https://ericflo.github.io/modelrelay/)\n\n- [Behavior contract](docs/behavior-contract.md) — the full specification of proxy, queue, streaming, and cancellation semantics\n- [Architecture sketch](docs/architecture.md) — how the pieces fit together internally\n- [Protocol walkthrough](docs/protocol-walkthrough.md) — ASCII wire traces for every message flow\n- [Operational runbook](docs/runbook.md) — health checks, draining, scaling, troubleshooting\n\n## Validation\n\nThe behavior matrix is exercised at three layers: black-box contract harnesses in `modelrelay-contract-tests`, live HTTP integration tests in `modelrelay-server`, and end-to-end live backend tests in `modelrelay-worker`.\n\n```bash\ncargo fmt --check\ncargo clippy --workspace --all-targets --all-features -- -D warnings\ncargo test --workspace\n```\n\n## Contributing\n\nBug reports, feature requests, and PRs are welcome. See\n[CONTRIBUTING.md](CONTRIBUTING.md) for code style, test expectations,\nbranch naming, and CI secrets.\n\nTo report a security vulnerability, follow the process in\n[SECURITY.md](SECURITY.md).\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fericflo%2Fmodelrelay","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fericflo%2Fmodelrelay","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fericflo%2Fmodelrelay/lists"}