An open API service indexing awesome lists of open source software.

https://github.com/unohee/freerouter

OpenAI-compatible proxy & library that auto-routes across OpenRouter's free models, with fallback on rate limits. Run it as a server or import it as a Python submodule.
https://github.com/unohee/freerouter

api-gateway fastapi free-models llm openai openrouter proxy python

Last synced: about 15 hours ago
JSON representation

OpenAI-compatible proxy & library that auto-routes across OpenRouter's free models, with fallback on rate limits. Run it as a server or import it as a Python submodule.

Awesome Lists containing this project

README

          

# freerouter

[![CI](https://github.com/unohee/freerouter/actions/workflows/ci.yml/badge.svg)](https://github.com/unohee/freerouter/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

An OpenAI-compatible proxy that gathers **only OpenRouter's free model endpoints** and
routes/falls back across them automatically.

Point any existing OpenAI SDK/client at freerouter, and it picks a model from OpenRouter's
free pool (`pricing.prompt == 0 && pricing.completion == 0`), then **falls back to the next
free model automatically** on a rate limit (429) or a transient error.

## How it works

1. On startup, fetch OpenRouter `/api/v1/models`, **keep only free models**, and cache them
(TTL 600s by default).
2. On `POST /v1/chat/completions`, the router builds a candidate order.
- A specific free `model` goes first; otherwise (`auto`/empty/paid/unknown) the free-pool
priority is used (`:free`-suffixed first, then larger context).
- A model that just returned 429 is pushed back during its cooldown (60s by default).
3. Try candidates in order, falling back on 429/402/404/5xx. If all fail, return 502.
4. The response header `X-freerouter-Model` carries the model actually used.

## Free models

The routable pool is OpenRouter's free, chat-capable models. The list below is generated from
the live catalog — run `python scripts/update_models.py` to refresh it locally, and CI refreshes
it automatically once a week.

_26 free, chat-capable models (prompt + completion priced at $0). Auto-generated by `scripts/update_models.py`._

| Model ID | Context |
|----------|---------|
| `qwen/qwen3-coder:free` | 1,048,576 |
| `nvidia/nemotron-3-ultra-550b-a55b:free` | 1,000,000 |
| `nvidia/nemotron-3-super-120b-a12b:free` | 1,000,000 |
| `poolside/laguna-xs.2:free` | 262,144 |
| `poolside/laguna-m.1:free` | 262,144 |
| `google/gemma-4-26b-a4b-it:free` | 262,144 |
| `google/gemma-4-31b-it:free` | 262,144 |
| `qwen/qwen3-next-80b-a3b-instruct:free` | 262,144 |
| `cohere/north-mini-code:free` | 256,000 |
| `nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free` | 256,000 |
| `nvidia/nemotron-3-nano-30b-a3b:free` | 256,000 |
| `openai/gpt-oss-120b:free` | 131,072 |
| `openai/gpt-oss-20b:free` | 131,072 |
| `meta-llama/llama-3.3-70b-instruct:free` | 131,072 |
| `meta-llama/llama-3.2-3b-instruct:free` | 131,072 |
| `nousresearch/hermes-3-llama-3.1-405b:free` | 131,072 |
| `nvidia/nemotron-3.5-content-safety:free` | 128,000 |
| `nvidia/nemotron-nano-12b-v2-vl:free` | 128,000 |
| `nvidia/nemotron-nano-9b-v2:free` | 128,000 |
| `liquid/lfm-2.5-1.2b-thinking:free` | 32,768 |
| `liquid/lfm-2.5-1.2b-instruct:free` | 32,768 |
| `cognitivecomputations/dolphin-mistral-24b-venice-edition:free` | 32,768 |
| `openrouter/owl-alpha` | 1,048,756 |
| `google/lyria-3-pro-preview` | 1,048,576 |
| `google/lyria-3-clip-preview` | 1,048,576 |
| `openrouter/free` | 200,000 |

## Install

```bash
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env # fill in OPENROUTER_API_KEY
```

## Run

```bash
freerouter # or: python -m freerouter
# defaults to http://127.0.0.1:8000
```

## Usage (with the OpenAI SDK as-is)

```python
from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8000/v1", api_key="unused")

resp = client.chat.completions.create(
model="auto", # freerouter picks from the free pool automatically
messages=[{"role": "user", "content": "hello"}],
)
print(resp.choices[0].message.content)
```

curl:

```bash
curl http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"hi"}]}'
```

## Use as a library (no server)

You can embed freerouter into another agent program **as a submodule** without running an
HTTP server. `FreeRouterClient` fetches the free-model list over REST (TTL-cached) and routes
with fallback in-process.

```python
import asyncio
from freerouter import FreeRouterClient

async def main():
async with FreeRouterClient() as fr: # uses OPENROUTER_API_KEY from .env
free = await fr.models() # routable free models
data = await fr.chat(
[{"role": "user", "content": "hello"}],
model="auto", # auto = pick from the free pool + fallback
max_tokens=64,
)
print(data["model"], data["choices"][0]["message"]["content"])

asyncio.run(main())
```

In synchronous code (outside an event loop), use the sync wrapper:

```python
from freerouter import FreeRouterClient

fr = FreeRouterClient(api_key="sk-or-...") # passing the key directly also works
data = fr.chat_sync([{"role": "user", "content": "hi"}], model="auto")
```

- **Share an httpx client**: if the agent already uses an `httpx.AsyncClient`, inject it —
`FreeRouterClient(http_client=my_client)`. When injected, the caller owns its lifecycle.
- **Share state**: inject `registry`/`router` so several clients share the free-list cache and
cooldowns.
- **Streaming**: `async for chunk in fr.stream_raw(payload)` — raw SSE bytes.
- A non-retryable 4xx (e.g. a bad request) raises `httpx.HTTPStatusError`; all-candidates-failed
raises `FreeRouterError`.

> The same routing/fallback core is shared by the proxy server (`proxy.py`) and the library.

## Endpoints

| Method | Path | Description |
|--------|------|-------------|
| POST | `/v1/chat/completions` | OpenAI-compatible. Supports streaming (`"stream": true`). |
| GET | `/v1/models` | Exposes only routable free models. |
| GET | `/health` | Health check. |

## Configuration

Controlled via `.env` or environment variables. See `src/freerouter/config.py` for the full list.

| Key | Default | Description |
|-----|---------|-------------|
| `OPENROUTER_API_KEY` | (required) | OpenRouter key |
| `MODEL_REFRESH_TTL` | 600 | free-list cache TTL (seconds) |
| `MAX_ATTEMPTS` | 8 | max number of models to try as fallbacks |
| `COOLDOWN_SECONDS` | 60 | how long to skip a model after a 429 (seconds) |

## Tests

```bash
pytest
```

## License

MIT — see [LICENSE](LICENSE).