{"id":51165176,"url":"https://github.com/planetf1/oxllm","last_synced_at":"2026-06-26T18:30:32.173Z","repository":{"id":361443431,"uuid":"1254365824","full_name":"planetf1/oxllm","owner":"planetf1","description":"🦀 Ultra-minimalist, high-resilience LLM routing gateway in Rust. OpenAI-compatible with auto-retry, backoffs, circuit breakers, SIGHUP hot-reloads, and OOM-proof telemetry. Perfect companion for planetf1/otelite.","archived":false,"fork":false,"pushed_at":"2026-05-30T17:22:54.000Z","size":137,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-30T18:09:45.412Z","etag":null,"topics":["artificial-intelligence","circuit-breaker","distributed-tracing","edge-computing","embedded","failover","gateway","generative-ai","high-availability","hot-reload","lightweight","llm","openai-compatible","opentelemetry","openwrt","otelite","proxy","rate-limiting","rust","zero-disk"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/planetf1.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-30T13:27:00.000Z","updated_at":"2026-05-30T17:21:59.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/planetf1/oxllm","commit_stats":null,"previous_names":["planetf1/oxllm"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/planetf1/oxllm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/planetf1%2Foxllm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/planetf1%2Foxllm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/planetf1%2Foxllm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/planetf1%2Foxllm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/planetf1","download_url":"https://codeload.github.com/planetf1/oxllm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/planetf1%2Foxllm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34829415,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-26T02:00:06.560Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","circuit-breaker","distributed-tracing","edge-computing","embedded","failover","gateway","generative-ai","high-availability","hot-reload","lightweight","llm","openai-compatible","opentelemetry","openwrt","otelite","proxy","rate-limiting","rust","zero-disk"],"created_at":"2026-06-26T18:30:31.076Z","updated_at":"2026-06-26T18:30:32.161Z","avatar_url":"https://github.com/planetf1.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `oxllm` 🦀 (Oxide LLM Proxy)\n\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Rust](https://img.shields.io/badge/Rust-1.85.1%2B-orange.svg)](https://www.rust-lang.org/)\n[![CI](https://github.com/planetf1/oxllm/actions/workflows/ci.yml/badge.svg)](https://github.com/planetf1/oxllm/actions/workflows/ci.yml)\n[![crates.io](https://img.shields.io/crates/v/oxllm.svg)](https://crates.io/crates/oxllm)\n\n`oxllm` (Oxide LLM Proxy) is an ultra-minimalist, high-resilience adaptive routing LLM gateway written in Rust. It exposes an OpenAI-compatible interface, proxying requests to a tiered fallback pool of LLM providers with automatic rate-limit detection, circuit breakers, and failover.\n\nBuilt to operate entirely in memory with zero local disk persistence, `oxllm` is optimized for resource-constrained edge devices (like OpenWrt routers), developer workstations, and background daemons. The **stripped release binary is ~2.6 MB** and idle RAM usage is **~14 MB**.\n\n---\n\n## 🚀 Key Features\n\n* **Zero-Disk Dependency**: No SQLite, local caching, or file write operations during routing. State is strictly in memory.\n* **\u0026lt;2ms Routing Overhead**: Lock-free concurrency across routing loop, counters, and probe permits. Verified by CI benchmark.\n* **Adaptive Circuit Breaker**: Strict `HalfOpen` state machine with lock-free `probe_in_flight` atomic check-and-set. Rate limits and server errors trip per-provider circuits with exponential backoff. Idle-based penalty decay automatically rehabilitates providers.\n* **Tiered Failover**: Configure fallback chains across multiple providers. If the primary returns 429 or 5xx, the proxy transparently cascades to the next.\n* **Hot Config Reloading**: `SIGHUP` signal or `POST /reload` HTTP endpoint — parses updated `config.toml` and hot-swaps the provider pool via `tokio::sync::watch` without dropping connections.\n* **Local Stats Dashboard**: Every provider tracks request count, success count, token volumes, and last request time via lock-free atomics. Query via `oxllm status` or `curl /status` — no external collector needed.\n* **OOM-Proof Telemetry**: Bounded OTel event channel (1024 cap) with non-blocking `try_send` drops. If `otelite` is offline, telemetry degrades gracefully and the proxy keeps running.\n* **W3C Trace Context Propagation**: Extracts and injects `traceparent` headers for continuous trace spans.\n* **Dual-Stack IPv4/IPv6**: Configurable via `bind_family`: `\"ipv4\"` (default), `\"ipv6\"`, or `\"dual\"` for both.\n* **Unix-Style Environment Expansion**: Shell-style `${VAR}` replacement in TOML config values.\n* **Musl Cross-Compilation**: Pure-Rust `rustls-tls` stack avoids native OpenSSL linking on edge routers.\n* **OpenAI SDK Compatible** — JSON error format, CORS headers, and `x-request-id`\n  correlation ID on every response. Works with official OpenAI Python and JavaScript\n  SDKs, including browser-based usage.\n\n---\n\n## 🌐 CORS Support\n\nAll public endpoints return `Access-Control-Allow-Origin: *`\nheaders. Browser-based applications can call the proxy directly.\n\n---\n\n## 📦 Project Layout\n\n```\noxllm/\n├── Cargo.toml              # Workspace root\n├── config.toml             # Multi-tier cloud provider config (6 providers)\n├── config-local-test.toml  # Local-only Ollama config for testing\n├── crates/\n│   ├── oxllm-core/         # Core: config parsing, circuit breaker, router, telemetry\n│   └── oxllm/              # CLI: Axum server, routes, signal handling, admin API\n├── docs/\n│   ├── architecture.md     # Concurrency model, circuit breaker rules, telemetry\n│   └── providers.md        # Free-tier provider guide (snapshot: 2026-05-30)\n├── .github/workflows/      # CI, security, release, crates.io publish\n└── dist-workspace.toml     # cargo-dist release config\n```\n\n---\n\n## 🛠️ Installation\n\n### 1. Homebrew (easiest — pre-compiled binary)\n\n```bash\nbrew tap planetf1/homebrew-tap\nbrew install oxllm\n```\n\nPre-compiled for macOS and Linux (aarch64 + x86_64). No Rust toolchain needed. Binary size: ~2.6 MB stripped.\n\n### 2. Cargo (compiled from source)\n\n```bash\ncargo install oxllm\n```\n\nBuilds from [crates.io](https://crates.io/crates/oxllm). Requires Rust 1.85.1+.\n\n### 3. From source (latest main)\n\n```bash\ngit clone https://github.com/planetf1/oxllm.git\ncd oxllm\ncargo build --release\n./target/release/oxllm serve --config config-local-test.toml\n```\n\n### Default Config Location\n\n`oxllm serve` looks for config in this order:\n1. `--config \u003cpath\u003e` if provided\n2. `~/.config/oxllm/config.toml` (XDG base directory)\n3. `./config.toml` (current directory, for development)\n\n```bash\n# Quick start with local Ollama (no API keys needed):\ncp config-local-test.toml ~/.config/oxllm/config.toml\noxllm serve\n\n# Or with cloud providers (set env vars first):\nexport GROQ_API_KEY=\"gsk_...\"\nexport GOOGLE_API_KEY=\"AIza...\"\ncp config.toml ~/.config/oxllm/config.toml\noxllm serve\n```\n\n\n## 🚀 Quick Start\n\nThe primary use case is routing across **multiple free-tier cloud providers** with automatic failover.\nOllama can be added as a local fallback for testing or as a last resort.\n\n### 1. Set up providers\n\nThe repo includes two ready-to-use configs:\n\n- **`config.toml`** — 6 free-tier cloud providers with 2 virtual model tiers\n- **`config-local-test.toml`** — local Ollama only (for testing)\n\nFor the cloud config, set your API keys (see [Provider Guide](docs/providers.md) for sign-up links):\n\n```bash\nexport GROQ_API_KEY=\"gsk_...\"\nexport GOOGLE_API_KEY=\"AIza...\"\nexport SAMBANOVA_API_KEY=\"...\"\nexport OPENROUTER_API_KEY=\"sk-or-...\"\n```\n\n### 2. Start the proxy\n\n```bash\noxllm serve --config config.toml\n```\n\n### 3. Test it\n\n```bash\n# Smart model (strongest available — cascades through providers on failure)\ncurl -X POST http://127.0.0.1:8080/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"smart\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}]}'\n\n# Basic model (fast, cheap, high rate limits)\ncurl -X POST http://127.0.0.1:8080/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"basic\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}]}'\n\n# Embeddings\ncurl -X POST http://127.0.0.1:8080/v1/embeddings \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"basic\", \"input\": \"hello world\"}'\n\n# Live dashboard (no external collector needed)\ncurl http://127.0.0.1:8080/status\n```\n\nFor local testing with Ollama instead of cloud providers:\n```bash\noxllm serve --config config-local-test.toml\n```\n\n## ⚙️ Configuration\n\n### Server Options\n\n| Field | Default | Description |\n|---|---|---|\n| `host` | `\"127.0.0.1\"` | Bind address (not used when `bind_family` is `ipv6`/`dual`) |\n| `port` | `8080` | Listen port |\n| `otel_endpoint` | — | OTLP HTTP endpoint (e.g. `http://127.0.0.1:4318`). If unreachable, proxy starts without telemetry. Records spans with GenAI semantic attributes, 3 metrics (provider status gauge, request duration histogram, token counter), and W3C trace context propagation. See [architecture docs](docs/architecture.md#4-telemetry-layer--trace-context-propagation). |\n| `upstream_timeout_secs` | `5` | Upstream request timeout in seconds |\n| `bind_family` | `\"ipv4\"` | Address family: `\"ipv4\"`, `\"ipv6\"`, or `\"dual\"` (both) |\n\n### Provider Definition\n\nEach provider requires `name`, `enabled`, `base_url` (with trailing `/v1/`), `api_key` (or `${VAR}` env reference), and `models` list.\n\n### Virtual Models (Fallback Chains)\n\nVirtual models define the routing order. If a provider returns 429 or 5xx, the proxy transparently tries the next:\n\n```toml\n[virtual_models]\nsmart = [\n  { provider = \"groq-strong\",  model = \"llama-3.3-70b-versatile\" },\n  { provider = \"groq-basic\",   model = \"meta-llama/llama-4-scout-17b-16e-instruct\" },\n  { provider = \"ollama-fallback\", model = \"granite4.1:3b\" },\n]\n```\n\n### How the Routing Algorithm Works\n\n1. When a request arrives, the proxy iterates the virtual model's provider list in order.\n2. For each provider, it checks: **circuit breaker state** (Closed? Open? HalfOpen?), **rate-limit window** (cooling down?), **manual override** (admin-disabled?).\n3. The first healthy provider is selected for the request.\n4. On success: circuit resets to Closed, failure count drops to 0.\n5. On 429 (rate limit): sets a cooldown timer based on `retry-after` header (default 30s). After 3 failures, circuit opens.\n6. On 5xx: increments failure counter. After 3 failures, circuit opens for **60 × 2^(failures-3)** seconds.\n7. **HalfOpen probes**: After cooldown expires, a single probe request is allowed. Only one concurrent probe — others bypass via atomic `compare_exchange`.\n8. **Idle decay**: Every 5 minutes without a request, failure count decreases by 1. Below 3 failures, Open circuits automatically rehabilitate to Closed.\n\n### Example Configs\n\n- `config.toml` — 6 cloud providers across 2 tiers (smart + basic)\n- `config-local-test.toml` — local Ollama only, zero API keys\n\n---\n\n## 📟 CLI Subcommands\n\n```bash\n# Start the proxy\noxllm serve                          # default: ~/.config/oxllm/config.toml\noxllm serve -v                       # verbose: per-request routing info\noxllm serve -vv                      # trace: full request/response dump\n\n# Validate config syntax\noxllm validate                       # checks env vars, provider cross-refs\n\n# Live dashboard (no external collector needed)\noxllm status                         # virtual model routing table + per-provider counters\n\n# Manage providers at runtime\noxllm provider list                  # condensed provider status table\noxllm provider offline \u003cname\u003e        # take a provider out of rotation\noxllm provider online \u003cname\u003e         # re-enable a disabled provider\noxllm provider reset \u003cname\u003e          # clear circuit breaker, failures, rate limit\n\n# Config hot-reload (SIGHUP)\noxllm reload\n\n# Graceful stop (drains in-flight SSE streams)\noxllm stop\n```\n\n### Example `oxllm status` Output (after ~5 hours of real use)\n\n```\nUptime: 311m 3s  |  Total Requests: 150\n\nVirtual Model: smart\n-------------------------------------------------------------------------------------------------------------------------------\n| Provider             | Model                                         | Circuit                        | Requests |  Success |\n-------------------------------------------------------------------------------------------------------------------------------\n| groq-strong          | llama-3.3-70b-versatile                       | Open (197s cooldown)           |       16 |        0 |\n| sambanova-strong     | Llama-4-Maverick-17B-128E-Instruct            | Closed (Healthy)               |       30 |        8 |\n| groq-basic           | meta-llama/llama-4-scout-17b-16e-instruct     | Open (225s cooldown)           |       30 |       17 |\n| google-basic         | gemini-2.5-flash                              | Closed (Healthy)               |       32 |       22 |\n| sambanova-basic      | DeepSeek-V3.1                                 | Closed (Healthy)               |       15 |       10 |\n| openrouter-basic     | ibm-granite/granite-4.1-8b                    | Closed (Healthy)               |       27 |       27 |\n| ollama-fallback      | granite4.1:3b                                 | Closed (Healthy)               |        0 |        0 |\n\nVirtual Model: basic\n-------------------------------------------------------------------------------------------------------------------------------\n| Provider             | Model                                         | Circuit                        | Requests |  Success |\n-------------------------------------------------------------------------------------------------------------------------------\n| groq-basic           | meta-llama/llama-4-scout-17b-16e-instruct     | Open (225s cooldown)           |       30 |       17 |\n| google-basic         | gemini-2.5-flash                              | Closed (Healthy)               |       32 |       22 |\n| sambanova-basic      | DeepSeek-V3.1                                 | Closed (Healthy)               |       15 |       10 |\n| openrouter-basic     | ibm-granite/granite-4.1-8b                    | Closed (Healthy)               |       27 |       27 |\n| ollama-fallback      | granite4.1:3b                                 | Closed (Healthy)               |        0 |        0 |\n\nUse 'oxllm provider offline \u003cname\u003e' to take a provider out of rotation.\nUse 'oxllm provider reset \u003cname\u003e' to clear circuit breaker state.\n```\n\nPiping through `cat` or a pager adds the full per-provider counter table with failure counts, token volumes, and last-request timestamps:\n\n```\n+--------------------+-----------------------------------------------+--------------------------------+----------+---------------+----------+-----------+--------------+---------------+--------------+\n| Provider Name      | Models                                                | Circuit Breaker State          | Failures | Rate Limited? | Requests | Successes | Tokens Input | Tokens Output | Last Request|\n+--------------------+-----------------------------------------------+--------------------------------+----------+---------------+----------+-----------+--------------+---------------+--------------+\n| groq-strong        | llama-3.3-70b-versatile                       | Open (Cooldown: 197s left)     | 5        | No            | 16       | 0         | 0            | 0             | Just now     |\n| sambanova-strong   | Llama-4-Maverick-17B-128E-Instruct            | Closed (Healthy)               | 13       | No            | 30       | 8         | 94           | 4             | Just now     |\n| groq-basic         | meta-llama/llama-4-scout-17b-16e-instruct     | Open (Cooldown: 225s left)     | 5        | No            | 30       | 17        | 232          | 10            | Just now     |\n| google-basic       | gemini-2.5-flash                              | Closed (Healthy)               | 8        | No            | 32       | 22        | 0            | 0             | Just now     |\n| sambanova-basic    | DeepSeek-V3.1                                 | Closed (Healthy)               | 1        | Yes           | 15       | 10        | 0            | 0             | Just now     |\n| openrouter-basic   | ibm-granite/granite-4.1-8b                    | Closed (Healthy)               | 0        | No            | 27       | 27        | 0            | 0             | Just now     |\n| ollama-fallback    | granite4.1:3b                                 | Closed (Healthy)               | 0        | No            | 0        | 0         | 0            | 0             | Never        |\n+--------------------+-----------------------------------------------+--------------------------------+----------+---------------+----------+-----------+--------------+---------------+--------------+\n```\n\nThis example — captured after 5 hours of real use — shows:\n- **groq-strong**: Circuit is *Open* (197s cooldown remaining) after 5 failures with 0 successes across 16 requests, meaning all attempts hit rate limits or errors.\n- **groq-basic**: Also *Open* (225s cooldown) after 5 failures, but 17 of 30 requests succeeded before the circuit tripped.\n- **sambanova-strong**: *Closed* and healthy but with 13 failures — it's been reliable enough to stay open despite a high error rate.\n- **openrouter-basic**: Perfect record — 27/27 requests succeeded, 0 failures, circuit Closed.\n- **sambanova-basic**: Currently *rate-limited* (1 failure, marked \"Yes\"), but the circuit remains Closed.\n- **ollama-fallback**: Never used (0 requests), sitting idle as the last-resort local model.\n\nAll admin endpoints (`/health`, `/status`, `/reload`, `/admin/*`) are restricted to localhost — external callers receive `403 Forbidden`.\n\n---\n\n## 📊 Telemetry\n\noxllm exports OpenTelemetry (OTel) traces and metrics via OTLP/HTTP JSON to a collector like [otelite](https://github.com/planetf1/otelite).\n\n### Configuration\n\nSet `otel_endpoint` in `[server]` to point at your OTLP HTTP collector:\n\n```toml\n[server]\notel_endpoint = \"http://127.0.0.1:4318\"\n```\n\nIf the endpoint is unreachable or not configured, oxllm logs a warning and starts\ndegraded — telemetry events are silently discarded. The proxy always works\nwithout a collector.\n\n### Span Attributes (Traces)\n\nEvery routed transaction generates a span with GenAI semantic conventions:\n\n| Attribute | Example | Description |\n|---|---|---|\n| `gen_ai.operation.name` | `chat` / `embeddings` | Operation type |\n| `gen_ai.provider.name` | `groq-strong` | Provider selected |\n| `gen_ai.request.model` | `llama-3.3-70b-versatile` | Model used |\n| `gen_ai.usage.input_tokens` | `1420` | Input token count |\n| `gen_ai.usage.output_tokens` | `312` | Output token count |\n| `proxy.attempts_required` | `2` | How many providers were tried |\n| `proxy.initial_failure_reason` | `429_too_many_requests` | First failure cause (if any) |\n\nSpans are linked to incoming W3C `traceparent` headers when present.\n\n### Metrics\n\n| Metric | Type | Description |\n|---|---|---|\n| `llm_proxy.provider.status` | Gauge | `0` = healthy, `1` = rate-limited, `2` = circuit tripped |\n| `llm_proxy.request.duration` | Histogram | Request lifecycle duration (ms) |\n| `llm_proxy.tokens.consumed` | Counter | Cumulative tokens by provider, model, type |\n\n### Logging\n\nLogs are emitted via `tracing` to stdout with `EnvFilter` support:\n\n- **Default**: `info` — server start/stop, circuit transitions, errors\n- **`-v`**: `debug` — adds per-request routing info\n- **`-vv`**: `trace` — full request/response details\n\nOverride via `RUST_LOG` env var:\n```bash\nexport RUST_LOG=oxllm=debug,oxllm_core=info\noxllm serve\n```\n\n## 📄 License\n\nLicensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplanetf1%2Foxllm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fplanetf1%2Foxllm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplanetf1%2Foxllm/lists"}