{"id":50718170,"url":"https://github.com/Chleba/ollamaMQ","last_synced_at":"2026-06-26T22:00:36.915Z","repository":{"id":340891321,"uuid":"1168034462","full_name":"Chleba/ollamaMQ","owner":"Chleba","description":"High-performance Ollama proxy with per-user fair-share queuing, round-robin scheduling, and a real-time TUI dashboard. Built in Rust.","archived":false,"fork":false,"pushed_at":"2026-06-01T00:13:05.000Z","size":18317,"stargazers_count":98,"open_issues_count":0,"forks_count":9,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-06-01T00:19:31.149Z","etag":null,"topics":["fair-share","llm-ops","message-queue","ollama","openai-compatible","proxy","rust","tui"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Chleba.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["Chleba"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"lfx_crowdfunding":null,"polar":null,"buy_me_a_coffee":"chlebikn","custom":null}},"created_at":"2026-02-27T00:13:13.000Z","updated_at":"2026-06-01T00:05:23.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Chleba/ollamaMQ","commit_stats":null,"previous_names":["chleba/ollamamq"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/Chleba/ollamaMQ","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Chleba%2FollamaMQ","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Chleba%2FollamaMQ/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Chleba%2FollamaMQ/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Chleba%2FollamaMQ/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Chleba","download_url":"https://codeload.github.com/Chleba/ollamaMQ/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Chleba%2FollamaMQ/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34834415,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-26T02:00:06.560Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fair-share","llm-ops","message-queue","ollama","openai-compatible","proxy","rust","tui"],"created_at":"2026-06-09T21:00:25.962Z","updated_at":"2026-06-26T22:00:36.900Z","avatar_url":"https://github.com/Chleba.png","language":"Rust","funding_links":["https://github.com/sponsors/Chleba","https://buymeacoffee.com/chlebikn"],"categories":["*Ops for AI","LLMOps"],"sub_categories":["LLMOps","LLM Gateways \u0026 Proxies"],"readme":"# ollamaMQ\n\n`ollamaMQ` is a high-performance, asynchronous message queue dispatcher and load balancer designed to sit in front of one or more [Ollama](https://ollama.ai/) or [LM Studio](https://lmstudio.ai/) API instances. It acts as a smart proxy that queues incoming requests from multiple users and dispatches them in parallel to multiple backends using a fair-share round-robin scheduler with least-connections load balancing.\n\n![Rust](https://img.shields.io/badge/rust-2024-orange.svg)\n![License](https://img.shields.io/badge/license-MIT-blue.svg)\n![Ollama](https://img.shields.io/badge/Ollama-Proxy-7ed321.svg)\n\n## 🚀 Features\n\n- **Multi-Backend Load Balancing**: Distribute requests across multiple Ollama or LM Studio instances using a **Least Connections + Round Robin** strategy. Automatically detects backend API type (Ollama `/api/*` vs OpenAI `/v1/*`) and routes each request to a compatible backend.\n- **Model-Aware Routing**: Automatically identifies the requested model from the request body and routes the request only to backends that have that specific model loaded. This prevents 404 errors when different models are distributed across multiple backends.\n- **Smart Model Matching**: Robust matching that handles common variations like `:latest` tags and case-insensitivity. For example, a request for `llama3` will correctly match `llama3:latest` on the backend.\n- **Parallel Processing**: Unlike basic proxies, `ollamaMQ` can process multiple requests simultaneously (one per available backend), significantly increasing throughput for multiple users.\n- **Backend Health Checks**: Automatically monitors backend status every 10 seconds. Probes for both API type (Ollama vs OpenAI) and the list of currently available models (via `/api/tags` and `/v1/models`). Offline instances are temporarily skipped and marked in the TUI.\n- **Per-User Queuing**: Each user (identified by the `X-User-ID` header) has their own FIFO queue.\n- **Fair-Share Scheduling**: Prevents any single user from monopolizing all available backends.\n- **Transparent Header Forwarding**: Full support for all HTTP headers (including `X-User-ID`) passed to and from the backend, ensuring compatibility with tools like **Claude Code**.\n- **VIP \u0026 Boost Modes**: Absolute priority (VIP) or increased frequency (Boost) for specific users.\n- **Real-Time TUI Dashboard**: Monitor backend health, active requests, queue depths, and throughput in real-time.\n- **OpenAI Compatibility**: Supports standard OpenAI-compatible endpoints.\n- **Async Architecture**: Built on `tokio` and `axum` for high concurrency.\n\n![ollamaMQ TUI Dashboard](demo.gif)\n\n## 🛠️ Installation\n\nEnsure you have [Rust](https://rustup.rs/) (2024 edition or later) and [Ollama](https://ollama.ai/) installed.\n\n### Option 1: Install via Cargo (Recommended)\n\n```bash\ncargo install ollamaMQ\n```\n\n### Option 2: From Source\n\n1. Clone the repository:\n\n   ```bash\n   git clone https://github.com/Chleba/ollamaMQ.git\n   cd ollamaMQ\n   ```\n\n2. Build and install locally:\n   ```bash\n   cargo install --path .\n   ```\n\n## 🏃 Usage\n\n### Docker Installation\n\n#### Using Docker Compose (Recommended)\n\n1. Ensure Docker and Docker Compose are installed.\n2. Start your local Ollama instance (defaulting to `localhost:11434`).\n3. Run:\n   ```bash\n   docker compose up -d\n   ```\n\n#### Using Docker CLI\n\nFirst build the image from the local Dockerfile:\n\n```bash\ndocker build -t chlebon/ollamamq .\n```\n\nThen run the container:\n\n```bash\ndocker run -d \\\n  --name ollamamq \\\n  -p 11435:11435 \\\n  --restart unless-stopped \\\n  chlebon/ollamamq\n```\n\n### Command Line Arguments\n\n`ollamaMQ` supports several options to configure the proxy:\n\n- `-p, --port \u003cPORT\u003e`: Port to listen on (default: `11435`)\n- `-o, --backend-urls \u003cURL1,URL2\u003e`: Comma-separated list of backend server URLs (Ollama, LM Studio, etc.) (default: `http://localhost:11434`)\n- `-t, --timeout \u003cSECONDS\u003e`: Request timeout in seconds (default: `300`)\n- `--no-tui`: Disable the interactive TUI dashboard (useful for Docker/CI)\n- `--allow-all-routes`: Enable fallback proxy for non-standard endpoints\n- `-h, --help`: Print help message\n- `-V, --version`: Print version information\n\n**Example:**\n\n```bash\nollamaMQ --port 8080 --ollama-urls http://10.0.0.1:11434,http://10.0.0.2:11434 --timeout 600\n```\n\n**Docker Example:**\n\n```bash\ndocker run -d \\\n  --name ollamamq \\\n  -p 8080:8080 \\\n  chlebon/ollamamq --port 8080 --ollama-urls http://192.168.1.5:11434 --timeout 600\n```\n\n### API Proxying\n\nPoint your LLM clients to the `ollamaMQ` port (`11435`) and include the `X-User-ID` header.\n\n#### Supported Endpoints:\n\n- `GET /health` (Internal health check)\n- `GET /` (Backend Status)\n- `POST /api/generate`\n- `POST /api/chat`\n- `POST /api/embed`\n- `POST /api/embeddings`\n- `GET /api/tags`\n- `POST /api/show`\n- `POST /api/create`\n- `POST /api/copy`\n- `DELETE /api/delete`\n- `POST /api/pull`\n- `POST /api/push`\n- `GET/HEAD/POST /api/blobs/{digest}`\n- `GET /api/ps`\n- `GET /api/version`\n- `POST /v1/chat/completions` (OpenAI Compatible)\n- `POST /v1/completions` (OpenAI Compatible)\n- `POST /v1/embeddings` (OpenAI Compatible)\n- `GET /v1/models` (OpenAI Compatible)\n- `GET /v1/models/{model}` (OpenAI Compatible)\n\n\n#### Example (cURL):\n\n```bash\ncurl -X POST http://localhost:11435/api/chat \\\n  -H \"X-User-ID: developer-1\" \\\n  -d '{\n    \"model\": \"qwen3.5:35b\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"Explain quantum computing.\"}],\n    \"stream\": true\n  }'\n```\n\n### Dashboard Controls\n\nThe interactive TUI dashboard provides a live view of the dispatcher's state:\n\n- **`j` / `k`** or **Arrows**: Navigate the selected list (Users, Backends, or Blocked Items).\n- **`Tab`** or **`h` / `l`**: Switch between the **Backends**, **Users**, and **Blocked** panels.\n- **`Space`** or **`Enter`**: Expand/collapse the available models list for the selected backend (in the Backends panel).\n- **`p`**: Toggle **VIP** status for the selected user (absolute priority).\n- **`b`**: Toggle **Boost** status for the selected user (prioritizes every 2nd request).\n- **`x`**: Block the selected user.\n- **`X`**: Block the selected user's IP address.\n- **`u`**: Unblock the selected user or IP (works in both panels).\n- **`q`** or **Esc**: Exit the dashboard and stop the application.\n- **`?`**: Toggle detailed help overlay.\n\n**Visual Indicators:**\n- `▶` / `▼`: Indicates if a backend's model list is collapsed or expanded.\n- `★` (Magenta): **VIP User** (absolute priority).\n- `⚡` (Yellow): **Boosted User** (every 2nd request priority).\n- `▶` (Cyan): Request is currently being processed/streamed.\n- `●` (Green): Backend is Online or User has requests waiting in the queue.\n- `○` (Gray): User is idle or Backend is Offline.\n- `✖` (Red): User or IP is blocked.\n\n### Logging\n\nLogs are automatically written to `ollamamq.log` in the current working directory. This keeps the terminal clear for the TUI dashboard while allowing you to monitor system events and debug backend communication.\n\n## 🐳 Docker\n\n### Docker Compose\n\nThe included `docker-compose.yml` provides a ready-to-use configuration:\n\n```yaml\nservices:\n  ollamamq:\n    build: .\n    image: chlebon/ollamamq:latest\n    container_name: ollamamq\n    ports:\n      - \"11435:11435\"\n    environment:\n      - OLLAMA_URLS=http://host.docker.internal:11434\n      - PORT=11435\n    extra_hosts:\n      - \"host.docker.internal:host-gateway\"\n    restart: unless-stopped\n```\n\n**Note for Linux Users:**\nWhen running in Docker on Linux to access a host-based Ollama:\n\n1.  **Listen on all interfaces:** Ollama must be configured to listen on `0.0.0.0`. You can do this by setting `export OLLAMA_HOST=0.0.0.0` before starting the Ollama service (or editing the systemd unit file).\n2.  **Firewall:** Ensure your firewall (e.g., `ufw`) allows traffic from the Docker bridge (usually `172.17.0.1/16`) to port `11434`.\n3.  **Host Gateway:** The `extra_hosts` setting in `docker-compose.yml` maps `host.docker.internal` to your host's IP address.\n\n### Dockerfile\n\nThe Dockerfile uses a multi-stage build:\n\n- **Build stage**: Uses `rust:1.85-alpine` to compile the release binary\n- **Runtime stage**: Uses `alpine:3.20` with only `ca-certificates` for a minimal footprint (~10MB)\n\n### Environment Variables\n\n| Variable      | Description                    | Default                  |\n| ------------- | ------------------------------ | ------------------------ |\n| `OLLAMA_URLS` | URLs of the Ollama servers     | `http://localhost:11434` |\n| `PORT`        | Port for ollamaMQ to listen on | `11435`                  |\n| `TIMEOUT`     | Request timeout in seconds     | `300`                    |\n\n### Connecting to Different Ollama Servers\n\n#### Local Ollama (on host machine)\n\n```bash\ndocker run -d \\\n  --name ollamamq \\\n  -p 11435:11435 \\\n  -e OLLAMA_URLS=http://host.docker.internal:11434 \\\n  chlebon/ollamamq\n```\n\n#### Remote Ollama Server\n\n```bash\ndocker run -d \\\n  --name ollamamq \\\n  -p 11435:11435 \\\n  -e OLLAMA_URLS=https://ollama.example.com:11434 \\\n  chlebon/ollamamq\n```\n\n#### Custom Port on Same Server\n\n```bash\ndocker run -d \\\n  --name ollamamq \\\n  -p 8080:8080 \\\n  -e OLLAMA_URLS=http://host.docker.internal:11436 \\\n  -e PORT=8080 \\\n  chlebon/ollamamq\n```\n\n#### Ollama in Docker (different container)\n\n```bash\ndocker run -d \\\n  --name ollamamq \\\n  --network ollama-network \\\n  -p 11435:11435 \\\n  -e OLLAMA_URLS=http://ollama:11434 \\\n  chlebon/ollamamq\n```\n\n### Port Configuration\n\n- **11435**: The proxy port that clients connect to (exposed by default)\n- **11434**: The Ollama server port (internal, not exposed)\n\nTo change the proxy port, use the `PORT` environment variable:\n\n```bash\ndocker run -d \\\n  --name ollamamq \\\n  -p 8080:8080 \\\n  -e PORT=8080 \\\n  chlebon/ollamamq\n```\n\n## 🏗️ Architecture\n\n- **`src/main.rs`**: Entry point, HTTP server initialization, and TUI lifecycle management.\n- **`src/dispatcher.rs`**: Core logic for queuing, round-robin scheduling, and Ollama proxying.\n- **`src/tui.rs`**: Implementation of the terminal-based monitoring dashboard.\n\n### Request Flow\n\n1. Client sends a request with `X-User-ID`.\n2. `ollamaMQ` pushes the request into a user-specific queue.\n3. The background worker checks for available backends (Online \u0026 not busy).\n4. If a backend is free, the worker pops the next task (fair-share rotation) and **spawns a parallel task**.\n5. The request is proxied to the selected Ollama backend.\n6. The response is streamed back to the client in real-time, while the worker can immediately start another task on a different backend.\n\n## 📦 Publishing to Docker Hub\n\nTo publish a new version of `ollamaMQ` to Docker Hub, follow these steps:\n\n1. **Update Version**: Update the version number in `Cargo.toml`.\n2. **Build and Tag**:\n\n   ```bash\n   # Build the image for the current version\n   docker build -t chlebon/ollamamq:v0.2.4 .\n   \n   # Tag it as latest\n   docker tag chlebon/ollamamq:v0.2.4 chlebon/ollamamq:latest\n   ```\n\n3. **Push to Hub**:\n\n   ```bash\n   # Log in to Docker Hub (if not already logged in)\n   docker login\n   \n   # Push the versioned tag\n   docker push chlebon/ollamamq:v0.2.4\n   \n   # Push the latest tag\n   docker push chlebon/ollamamq:latest\n   ```\n\n## 🧪 Development\n\n### Stress Testing\n\nYou can use the provided `test_dispatcher.sh` script to simulate multiple users and verify the dispatcher's behavior under load:\n\n```bash\n./test_dispatcher.sh\n```\n\n![ollamaMQ Stress Test](demo-test.gif)\n\n## 📝 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details (if applicable).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FChleba%2FollamaMQ","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FChleba%2FollamaMQ","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FChleba%2FollamaMQ/lists"}