https://github.com/unohee/freerouter

OpenAI-compatible proxy & library that auto-routes across OpenRouter's free models, with fallback on rate limits. Run it as a server or import it as a Python submodule.
https://github.com/unohee/freerouter

api-gateway fastapi free-models llm openai openrouter proxy python

Last synced: about 15 hours ago
JSON representation

OpenAI-compatible proxy & library that auto-routes across OpenRouter's free models, with fallback on rate limits. Run it as a server or import it as a Python submodule.

Host: GitHub
URL: https://github.com/unohee/freerouter
Owner: unohee
License: mit
Created: 2026-06-30T06:34:36.000Z (5 days ago)
Default Branch: main
Last Pushed: 2026-06-30T06:51:49.000Z (5 days ago)
Last Synced: 2026-06-30T08:24:21.022Z (5 days ago)
Topics: api-gateway, fastapi, free-models, llm, openai, openrouter, proxy, python
Language: Python
Homepage:
Size: 33.2 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # freerouter

[![CI](https://github.com/unohee/freerouter/actions/workflows/ci.yml/badge.svg)](https://github.com/unohee/freerouter/actions/workflows/ci.yml)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

An OpenAI-compatible proxy that gathers **only OpenRouter's free model endpoints** and

routes/falls back across them automatically.

Point any existing OpenAI SDK/client at freerouter, and it picks a model from OpenRouter's

free pool (`pricing.prompt == 0 && pricing.completion == 0`), then **falls back to the next

free model automatically** on a rate limit (429) or a transient error.

## How it works

1. On startup, fetch OpenRouter `/api/v1/models`, **keep only free models**, and cache them

   (TTL 600s by default).

2. On `POST /v1/chat/completions`, the router builds a candidate order.

   - A specific free `model` goes first; otherwise (`auto`/empty/paid/unknown) the free-pool

     priority is used (`:free`-suffixed first, then larger context).

   - A model that just returned 429 is pushed back during its cooldown (60s by default).

3. Try candidates in order, falling back on 429/402/404/5xx. If all fail, return 502.

4. The response header `X-freerouter-Model` carries the model actually used.

## Free models

The routable pool is OpenRouter's free, chat-capable models. The list below is generated from

the live catalog — run `python scripts/update_models.py` to refresh it locally, and CI refreshes

it automatically once a week.

_26 free, chat-capable models (prompt + completion priced at $0). Auto-generated by `scripts/update_models.py`._

| Model ID | Context |

|----------|---------|

| `qwen/qwen3-coder:free` | 1,048,576 |

| `nvidia/nemotron-3-ultra-550b-a55b:free` | 1,000,000 |

| `nvidia/nemotron-3-super-120b-a12b:free` | 1,000,000 |

| `poolside/laguna-xs.2:free` | 262,144 |

| `poolside/laguna-m.1:free` | 262,144 |

| `google/gemma-4-26b-a4b-it:free` | 262,144 |

| `google/gemma-4-31b-it:free` | 262,144 |

| `qwen/qwen3-next-80b-a3b-instruct:free` | 262,144 |

| `cohere/north-mini-code:free` | 256,000 |

| `nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free` | 256,000 |

| `nvidia/nemotron-3-nano-30b-a3b:free` | 256,000 |

| `openai/gpt-oss-120b:free` | 131,072 |

| `openai/gpt-oss-20b:free` | 131,072 |

| `meta-llama/llama-3.3-70b-instruct:free` | 131,072 |

| `meta-llama/llama-3.2-3b-instruct:free` | 131,072 |

| `nousresearch/hermes-3-llama-3.1-405b:free` | 131,072 |

| `nvidia/nemotron-3.5-content-safety:free` | 128,000 |

| `nvidia/nemotron-nano-12b-v2-vl:free` | 128,000 |

| `nvidia/nemotron-nano-9b-v2:free` | 128,000 |

| `liquid/lfm-2.5-1.2b-thinking:free` | 32,768 |

| `liquid/lfm-2.5-1.2b-instruct:free` | 32,768 |

| `cognitivecomputations/dolphin-mistral-24b-venice-edition:free` | 32,768 |

| `openrouter/owl-alpha` | 1,048,756 |

| `google/lyria-3-pro-preview` | 1,048,576 |

| `google/lyria-3-clip-preview` | 1,048,576 |

| `openrouter/free` | 200,000 |

## Install

```bash

python -m venv .venv && source .venv/bin/activate

pip install -e ".[dev]"

cp .env.example .env   # fill in OPENROUTER_API_KEY

```

## Run

```bash

freerouter            # or: python -m freerouter

# defaults to http://127.0.0.1:8000

```

## Usage (with the OpenAI SDK as-is)

```python

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8000/v1", api_key="unused")

resp = client.chat.completions.create(

    model="auto",  # freerouter picks from the free pool automatically

    messages=[{"role": "user", "content": "hello"}],

)

print(resp.choices[0].message.content)

```

curl:

```bash

curl http://127.0.0.1:8000/v1/chat/completions \

  -H "Content-Type: application/json" \

  -d '{"model":"auto","messages":[{"role":"user","content":"hi"}]}'

```

## Use as a library (no server)

You can embed freerouter into another agent program **as a submodule** without running an

HTTP server. `FreeRouterClient` fetches the free-model list over REST (TTL-cached) and routes

with fallback in-process.

```python

import asyncio

from freerouter import FreeRouterClient

async def main():

    async with FreeRouterClient() as fr:           # uses OPENROUTER_API_KEY from .env

        free = await fr.models()                    # routable free models

        data = await fr.chat(

            [{"role": "user", "content": "hello"}],

            model="auto",                           # auto = pick from the free pool + fallback

            max_tokens=64,

        )

        print(data["model"], data["choices"][0]["message"]["content"])

asyncio.run(main())

```

In synchronous code (outside an event loop), use the sync wrapper:

```python

from freerouter import FreeRouterClient

fr = FreeRouterClient(api_key="sk-or-...")          # passing the key directly also works

data = fr.chat_sync([{"role": "user", "content": "hi"}], model="auto")

```

- **Share an httpx client**: if the agent already uses an `httpx.AsyncClient`, inject it —

  `FreeRouterClient(http_client=my_client)`. When injected, the caller owns its lifecycle.

- **Share state**: inject `registry`/`router` so several clients share the free-list cache and

  cooldowns.

- **Streaming**: `async for chunk in fr.stream_raw(payload)` — raw SSE bytes.

- A non-retryable 4xx (e.g. a bad request) raises `httpx.HTTPStatusError`; all-candidates-failed

  raises `FreeRouterError`.

> The same routing/fallback core is shared by the proxy server (`proxy.py`) and the library.

## Endpoints

| Method | Path | Description |

|--------|------|-------------|

| POST | `/v1/chat/completions` | OpenAI-compatible. Supports streaming (`"stream": true`). |

| GET | `/v1/models` | Exposes only routable free models. |

| GET | `/health` | Health check. |

## Configuration

Controlled via `.env` or environment variables. See `src/freerouter/config.py` for the full list.

| Key | Default | Description |

|-----|---------|-------------|

| `OPENROUTER_API_KEY` | (required) | OpenRouter key |

| `MODEL_REFRESH_TTL` | 600 | free-list cache TTL (seconds) |

| `MAX_ATTEMPTS` | 8 | max number of models to try as fallbacks |

| `COOLDOWN_SECONDS` | 60 | how long to skip a model after a 429 (seconds) |

## Tests

```bash

pytest

```

## License

MIT — see [LICENSE](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/unohee/freerouter

Awesome Lists containing this project

README