{"id":24496333,"url":"https://github.com/inference-gateway/inference-gateway","last_synced_at":"2026-04-28T04:04:08.623Z","repository":{"id":272206827,"uuid":"914086615","full_name":"inference-gateway/inference-gateway","owner":"inference-gateway","description":"An open-source, cloud-native, high-performance gateway unifying multiple LLM providers, from local solutions like Ollama to major cloud providers such as OpenAI, Groq, Cohere, Anthropic, Cloudflare and DeepSeek.","archived":false,"fork":false,"pushed_at":"2026-04-03T12:59:27.000Z","size":2691,"stargazers_count":107,"open_issues_count":4,"forks_count":18,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-03T17:30:10.974Z","etag":null,"topics":["agnostic","anthropic","api","cohere","deepseek-v3-2","gateway","gateway-api","golang","inference-api","kubernetes","llm","openai","opensource","opensource-projects","opentelemetry","performance","proxy","proxy-server","self-hosted","tracing"],"latest_commit_sha":null,"homepage":"https://docs.inference-gateway.com","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/inference-gateway.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-01-08T23:27:25.000Z","updated_at":"2026-04-03T12:59:29.000Z","dependencies_parsed_at":"2026-03-19T04:03:39.273Z","dependency_job_id":null,"html_url":"https://github.com/inference-gateway/inference-gateway","commit_stats":null,"previous_names":["edenreich/inference-gateway","inference-gateway/inference-gateway"],"tags_count":150,"template":false,"template_full_name":null,"purl":"pkg:github/inference-gateway/inference-gateway","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inference-gateway%2Finference-gateway","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inference-gateway%2Finference-gateway/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inference-gateway%2Finference-gateway/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inference-gateway%2Finference-gateway/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/inference-gateway","download_url":"https://codeload.github.com/inference-gateway/inference-gateway/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inference-gateway%2Finference-gateway/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31463015,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T21:22:52.476Z","status":"online","status_checked_at":"2026-04-06T02:00:07.287Z","response_time":112,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agnostic","anthropic","api","cohere","deepseek-v3-2","gateway","gateway-api","golang","inference-api","kubernetes","llm","openai","opensource","opensource-projects","opentelemetry","performance","proxy","proxy-server","self-hosted","tracing"],"created_at":"2025-01-21T21:16:36.994Z","updated_at":"2026-04-06T07:01:43.834Z","avatar_url":"https://github.com/inference-gateway.png","language":"Go","funding_links":[],"categories":["Tools \u0026 Libraries","LLM 部署与推理 (Deployment \u0026 Inference)","Quick Comparison"],"sub_categories":["Integration Platforms","推理网关 (Inference Gateways)"],"readme":"\u003ch1 align=\"center\"\u003eInference Gateway\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003c!-- CI Status Badge --\u003e\n  \u003ca href=\"https://github.com/inference-gateway/inference-gateway/actions/workflows/ci.yml?query=branch%3Amain\"\u003e\n    \u003cimg\n      src=\"https://github.com/inference-gateway/inference-gateway/actions/workflows/ci.yml/badge.svg?branch=main\"\n      alt=\"CI Status\"/\u003e\n  \u003c/a\u003e\n  \u003c!-- Version Badge --\u003e\n  \u003ca href=\"https://github.com/inference-gateway/inference-gateway/releases\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/v/release/inference-gateway/inference-gateway?color=blue\u0026style=flat-square\"\n         alt=\"Version\"/\u003e\n  \u003c/a\u003e\n  \u003c!-- License Badge --\u003e\n  \u003ca href=\"https://github.com/inference-gateway/inference-gateway/blob/main/LICENSE\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/license/inference-gateway/inference-gateway?color=blue\u0026style=flat-square\" alt=\"License\"/\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\nThe Inference Gateway is a proxy server designed to facilitate access to various\nlanguage model APIs. It allows users to interact with different language models\nthrough a unified interface, simplifying the configuration and the process of\nsending requests and receiving responses from multiple LLMs, enabling an easy\nuse of Mixture of Experts.\n\n- [Key Features](#key-features)\n- [Overview](#overview)\n- [Installation](#installation)\n- [Middleware Control and Bypass Mechanisms](#middleware-control-and-bypass-mechanisms)\n- [Model Context Protocol (MCP) Integration](#model-context-protocol-mcp-integration)\n- [Metrics and Observability](#metrics-and-observability)\n- [Supported API's](#supported-apis)\n- [Configuration](#configuration)\n- [Examples](#examples)\n- [SDKs](#sdks)\n- [CLI Tool](#cli-tool)\n- [Contributing](#contributing)\n- [License](#license)\n\n## Key Features\n\n- 📜 **Open Source**: Available under the MIT License.\n- 🚀 **Unified API Access**: Proxy requests to multiple language model APIs,\n  including OpenAI, Ollama, Ollama Cloud, Groq, Cohere etc.\n- ⚙️ **Environment Configuration**: Easily configure API keys and URLs through environment variables.\n- 🔧 **Tool-use Support**: Enable function calling capabilities across supported\n  providers with a unified API.\n- 🌐 **MCP Support**: Full Model Context Protocol integration - automatically\n  discover and expose tools from MCP servers to LLMs without client-side tool\n  management.\n- 🌊 **Streaming Responses**: Stream tokens in real-time as they're generated from language models.\n- 🖼️ **Vision/Multimodal Support**: Process images alongside text with vision-capable models.\n- 🐳 **Docker Support**: Use Docker and Docker Compose for easy setup and deployment.\n- ☸️ **Kubernetes Support**: Ready for deployment in Kubernetes environments.\n- 📊 **OpenTelemetry**: Monitor and analyze performance.\n- 🛡️ **Production Ready**: Built with production in mind, with configurable timeouts and TLS support.\n- 🌿 **Lightweight**: Includes only essential libraries and runtime, resulting\n  in smaller size binary of ~10.8MB.\n- 📉 **Minimal Resource Consumption**: Designed to consume minimal resources and have a lower footprint.\n- 📚 **Documentation**: Well documented with examples and guides.\n- 🧪 **Tested**: Extensively tested with unit tests and integration tests.\n- 🛠️ **Maintained**: Actively maintained and developed.\n- 📈 **Scalable**: Easily scalable and can be used in a distributed environment\n  with \u003ca href=\"https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/\" target=\"_blank\"\u003eHPA\u003c/a\u003e\n  in Kubernetes.\n- 🔒 **Compliance** and Data Privacy: This project does not collect data or\n  analytics, ensuring compliance and data privacy.\n- 🏠 **Self-Hosted**: Can be self-hosted for complete control over the deployment environment.\n- ⌨️ **CLI Tool**: Improved command-line interface for managing and\n  interacting with the Inference Gateway\n\n## Overview\n\nYou can horizontally scale the Inference Gateway to handle multiple requests\nfrom clients. The Inference Gateway will forward the requests to the respective\nprovider and return the response to the client.\n\n**Note**: MCP middleware components can be easily toggled on/off via\nenvironment variables (`MCP_ENABLE`) or bypassed per-request using headers\n(`X-MCP-Bypass`), giving you full control over which capabilities are active.\n\n**Note**: Vision/multimodal support is disabled by default for security and\nperformance. To enable image processing with vision-capable models (GPT-4o,\nClaude 4.5, Gemini 2.5, etc.), set `ENABLE_VISION=true` in your environment\nconfiguration.\n\nThe following diagram illustrates the flow:\n\n```mermaid\n%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#326CE5', 'primaryTextColor': '#fff', 'lineColor': '#5D8AA8', 'secondaryColor': '#006100' }, 'fontFamily': 'Arial', 'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'padding': 15}}}%%\n\n\ngraph TD\n    %% Client nodes\n    A[\"👥 Clients / 🤖 Agents\"] --\u003e |POST /v1/chat/completions| Auth\n\n    %% Auth node\n    Auth[\"🔒 Optional OIDC\"] --\u003e |Auth?| IG1\n    Auth --\u003e |Auth?| IG2\n    Auth --\u003e |Auth?| IG3\n\n    %% Gateway nodes\n    IG1[\"🖥️ Inference Gateway\"] --\u003e P\n    IG2[\"🖥️ Inference Gateway\"] --\u003e P\n    IG3[\"🖥️ Inference Gateway\"] --\u003e P\n\n    %% Middleware Processing and Direct Routing\n    P[\"🔌 Proxy Gateway\"] --\u003e MCP[\"🌐 MCP Middleware\"]\n    P --\u003e |\"Direct routing bypassing middleware\"| Direct[\"🔌 Direct Providers\"]\n    MCP --\u003e |\"Middleware chain complete\"| Providers[\"🤖 LLM Providers\"]\n\n    %% MCP Tool Servers\n    MCP --\u003e MCP1[\"📁 File System Server\"]\n    MCP --\u003e MCP2[\"🔍 Search Server\"]\n    MCP --\u003e MCP3[\"🌐 Web Server\"]\n\n    %% LLM Providers (Middleware Enhanced)\n    Providers --\u003e C1[\"🦙 Ollama\"]\n    Providers --\u003e D1[\"🚀 Groq\"]\n    Providers --\u003e E1[\"☁️ OpenAI\"]\n\n    %% Direct Providers (Bypass Middleware)\n    Direct --\u003e C[\"🦙 Ollama\"]\n    Direct --\u003e D[\"🚀 Groq\"]\n    Direct --\u003e E[\"☁️ OpenAI\"]\n    Direct --\u003e G[\"⚡ Cloudflare\"]\n    Direct --\u003e H1[\"💬 Cohere\"]\n    Direct --\u003e H2[\"🧠 Anthropic\"]\n    Direct --\u003e H3[\"🐋 DeepSeek\"]\n\n    %% Define styles\n    classDef client fill:#9370DB,stroke:#333,stroke-width:1px,color:white;\n    classDef auth fill:#F5A800,stroke:#333,stroke-width:1px,color:black;\n    classDef gateway fill:#326CE5,stroke:#fff,stroke-width:1px,color:white;\n    classDef provider fill:#32CD32,stroke:#333,stroke-width:1px,color:white;\n    classDef mcp fill:#FF69B4,stroke:#333,stroke-width:1px,color:white;\n\n    %% Apply styles\n    class A client;\n    class Auth auth;\n    class IG1,IG2,IG3,P gateway;\n    class C,D,E,G,H1,H2,H3,C1,D1,E1,Providers provider;\n    class MCP,MCP1,MCP2,MCP3 mcp;\n    class Direct direct;\n```\n\nClient is sending:\n\n```bash\ncurl -X POST http://localhost:8080/v1/chat/completions\n  -d '{\n    \"model\": \"openai/gpt-3.5-turbo\",\n    \"messages\": [\n      {\n        \"role\": \"system\",\n        \"content\": \"You are a pirate.\"\n      },\n      {\n        \"role\": \"user\",\n        \"content\": \"Hello, world! How are you doing today?\"\n      }\n    ],\n  }'\n```\n\n\\*\\* Internally the request is proxied to OpenAI, the Inference Gateway inferring the provider by the model name.\n\nYou can also send the request explicitly using `?provider=openai` or any other supported provider in the URL.\n\nFinally client receives:\n\n```json\n{\n  \"choices\": [\n    {\n      \"finish_reason\": \"stop\",\n      \"index\": 0,\n      \"message\": {\n        \"content\": \"Ahoy, matey! 🏴‍☠️ The seas be wild, the sun be bright, and this here pirate be ready to conquer the day! What be yer business, landlubber? 🦜\",\n        \"role\": \"assistant\"\n      }\n    }\n  ],\n  \"created\": 1741821109,\n  \"id\": \"chatcmpl-dc24995a-7a6e-4d95-9ab3-279ed82080bb\",\n  \"model\": \"N/A\",\n  \"object\": \"chat.completion\",\n  \"usage\": {\n    \"completion_tokens\": 0,\n    \"prompt_tokens\": 0,\n    \"total_tokens\": 0\n  }\n}\n```\n\nFor streaming the tokens simply add to the request body `stream: true`.\n\n## Installation\n\n\u003e **Recommended**: For production deployments, running the Inference Gateway as\n\u003e a container is recommended. This provides better isolation, easier updates,\n\u003e and simplified configuration management. See [Docker](examples/docker-compose/)\n\u003e or [Kubernetes](examples/kubernetes/) deployment examples.\n\nThe Inference Gateway can also be installed as a standalone binary using the\nprovided install script or by downloading pre-built binaries from GitHub\nreleases.\n\n### Using Install Script\n\nThe easiest way to install the Inference Gateway is using the automated install script:\n\n**Install latest version:**\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/inference-gateway/inference-gateway/main/install.sh | bash\n```\n\n**Install specific version:**\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/inference-gateway/inference-gateway/main/install.sh | VERSION=v0.22.3 bash\n```\n\n**Install to custom directory:**\n\n```bash\n# Install to custom location\ncurl -fsSL https://raw.githubusercontent.com/inference-gateway/inference-gateway/main/install.sh | INSTALL_DIR=~/.local/bin bash\n\n# Install to current directory\ncurl -fsSL https://raw.githubusercontent.com/inference-gateway/inference-gateway/main/install.sh | INSTALL_DIR=. bash\n```\n\n**What the script does:**\n\n- Automatically detects your operating system (Linux/macOS) and architecture (x86_64/arm64/armv7)\n- Downloads the appropriate binary from GitHub releases\n- Extracts and installs to `/usr/local/bin` (or custom directory)\n- Verifies the installation\n\n**Supported platforms:**\n\n- Linux: x86_64, arm64, armv7\n- macOS (Darwin): x86_64 (Intel), arm64 (Apple Silicon)\n\n### Manual Download\n\nDownload pre-built binaries directly from the [releases page](https://github.com/inference-gateway/inference-gateway/releases):\n\n1. Download the appropriate archive for your platform\n2. Extract the binary:\n\n   ```bash\n   tar -xzf inference-gateway_\u003cOS\u003e_\u003cARCH\u003e.tar.gz\n   ```\n\n3. Move to a directory in your PATH:\n\n   ```bash\n   sudo mv inference-gateway /usr/local/bin/\n   chmod +x /usr/local/bin/inference-gateway\n   ```\n\n### Verify Installation\n\n```bash\ninference-gateway --version\n```\n\n### Running the Gateway\n\nOnce installed, start the gateway with your configuration:\n\n```bash\n# Set required environment variables\nexport OPENAI_API_KEY=\"your-api-key\"\n\n# Start the gateway\ninference-gateway\n```\n\nFor detailed configuration options, see the [Configuration](#configuration) section below.\n\n## Middleware Control and Bypass Mechanisms\n\nThe Inference Gateway uses middleware to process requests and add capabilities\nlike MCP (Model Context Protocol). Clients can control which middlewares are\nactive using bypass headers:\n\n### Bypass Headers\n\n- **`X-MCP-Bypass`**: Skip MCP middleware processing\n\n### Client Control Examples\n\n```bash\n# Use only standard tool calls (skip MCP)\ncurl -X POST http://localhost:8080/v1/chat/completions \\\n  -H \"X-MCP-Bypass: true\" \\\n  -d '{\n    \"model\": \"anthropic/claude-3-haiku\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"Connect to external agents\"}]\n  }'\n\n# Skip both middlewares for direct provider access\ncurl -X POST http://localhost:8080/v1/chat/completions \\\n  -H \"X-MCP-Bypass: true\" \\\n  -d '{\n    \"model\": \"groq/llama-3-8b\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"Simple chat without tools\"}]\n  }'\n```\n\n### When to Use Bypass Headers\n\n**For Performance:**\n\n- Skip middleware processing when you don't need tool capabilities\n- Reduce latency for simple chat interactions\n\n**For Selective Features:**\n\n- Use only standard tool calls (skip MCP): Add `X-MCP-Bypass: true`\n- Direct provider access\n\n**For Development:**\n\n- Test middleware behavior in isolation\n- Debug tool integration issues\n- Ensure backward compatibility with existing applications\n\n### How It Works Internally\n\nThe middlewares use these same headers to prevent infinite loops during their operation:\n\n**MCP Processing:**\n\n- When tools are detected in a response, the MCP agent makes up to 10 follow-up requests\n- Each follow-up request includes `X-MCP-Bypass: true` to skip middleware re-processing\n- This allows the agent to iterate without creating circular calls\n\n\u003e **Note**: These bypass headers only affect middleware processing. The core\n\u003e chat completions functionality remains available regardless of header values.\n\n## Model Context Protocol (MCP) Integration\n\nEnable MCP to automatically provide tools to LLMs without requiring clients to\nmanage them:\n\n```bash\n# Enable MCP and connect to tool servers\nexport MCP_ENABLE=true\nexport MCP_SERVERS=\"http://filesystem-server:3001/mcp,http://search-server:3002/mcp\"\n\n# LLMs will automatically discover and use available tools\ncurl -X POST http://localhost:8080/v1/chat/completions \\\n  -d '{\n    \"model\": \"openai/gpt-4\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"List files in the current directory\"}]\n  }'\n```\n\nThe gateway automatically injects available tools into requests and handles tool\nexecution, making external capabilities seamlessly available to any LLM.\n\n\u003e **Learn more**:\n\u003e [Model Context Protocol Documentation](https://modelcontextprotocol.io/) |\n\u003e [MCP Integration Example](examples/docker-compose/mcp/)\n\n## Metrics and Observability\n\nThe Inference Gateway provides comprehensive OpenTelemetry metrics for\nmonitoring performance, usage, and function/tool call activity. Metrics are\nautomatically exported to Prometheus format and available on port 9464 by\ndefault.\n\n### Enabling Metrics\n\n```bash\n# Enable telemetry and set metrics port (default: 9464)\nexport TELEMETRY_ENABLE=true\nexport TELEMETRY_METRICS_PORT=9464\n\n# Access metrics endpoint\ncurl http://localhost:9464/metrics\n```\n\n### Available Metrics\n\n#### Token Usage Metrics\n\nTrack token consumption across different providers and models:\n\n- **`llm_usage_prompt_tokens_total`** - Counter for prompt tokens consumed\n- **`llm_usage_completion_tokens_total`** - Counter for completion tokens generated\n- **`llm_usage_total_tokens_total`** - Counter for total token usage\n\n**Labels**: `provider`, `model`\n\n```promql\n# Total tokens used by OpenAI models in the last hour\nsum(increase(llm_usage_total_tokens_total{provider=\"openai\"}[1h])) by (model)\n```\n\n#### Request/Response Metrics\n\nMonitor API performance and reliability:\n\n- **`llm_requests_total`** - Counter for total requests processed\n- **`llm_responses_total`** - Counter for responses by HTTP status code\n- **`llm_request_duration`** - Histogram for end-to-end request duration (milliseconds)\n\n**Labels**: `provider`, `request_method`, `request_path`, `status_code` (responses only)\n\n```promql\n# 95th percentile request latency by provider\nhistogram_quantile(0.95, sum(rate(llm_request_duration_bucket{provider=~\"openai|anthropic\"}[5m])) by (provider, le))\n\n# Error rate percentage by provider\n100 * sum(rate(llm_responses_total{status_code!~\"2..\"}[5m])) by (provider) / sum(rate(llm_responses_total[5m])) by (provider)\n```\n\n#### Function/Tool Call Metrics\n\nComprehensive tracking of tool executions for MCP, and standard function calls:\n\n- **`llm_tool_calls_total`** - Counter for total function/tool calls executed\n- **`llm_tool_calls_success_total`** - Counter for successful tool executions\n- **`llm_tool_calls_failure_total`** - Counter for failed tool executions\n- **`llm_tool_call_duration`** - Histogram for tool execution duration (milliseconds)\n\n**Labels**: `provider`, `model`, `tool_type`, `tool_name`, `error_type` (failures only)\n\n**Tool Types**:\n\n- `mcp` - Model Context Protocol tools (prefix: `mcp_`)\n- `standard_tool_use` - Other function calls\n\n```promql\n# Tool call success rate by type\n100 * sum(rate(llm_tool_calls_success_total[5m])) by (tool_type) / sum(rate(llm_tool_calls_total[5m])) by (tool_type)\n\n# Average tool execution time by provider\nsum(rate(llm_tool_call_duration_sum[5m])) by (provider) / sum(rate(llm_tool_call_duration_count[5m])) by (provider)\n\n# Most frequently used tools\ntopk(10, sum(increase(llm_tool_calls_total[1h])) by (tool_name))\n```\n\n### Monitoring Setup\n\n#### Docker Compose Example\n\nComplete monitoring stack with Grafana dashboards:\n\n```bash\ncd examples/docker-compose/monitoring/\ncp .env.example .env  # Configure your API keys\ndocker compose up -d\n\n# Access Grafana at http://localhost:3000 (admin/admin)\n```\n\n#### Kubernetes Example\n\nProduction-ready monitoring with Prometheus Operator:\n\n```bash\ncd examples/kubernetes/monitoring/\ntask deploy-infrastructure\ntask deploy-inference-gateway\n\n# Access via port-forward or ingress\nkubectl port-forward svc/grafana-service 3000:3000\n```\n\n### Grafana Dashboard\n\nThe included Grafana dashboard provides:\n\n- **Real-time Metrics**: 5-second refresh rate for immediate feedback\n- **Tool Call Analytics**: Success rates, duration analysis, and failure\n  tracking\n- **Provider Comparison**: Performance metrics across all supported providers\n- **Usage Insights**: Token consumption patterns and cost analysis\n- **Error Monitoring**: Failed requests and tool call error classification\n\n\u003e **Learn more**:\n\u003e [Docker Compose Monitoring](examples/docker-compose/monitoring/) |\n\u003e [Kubernetes Monitoring](examples/kubernetes/monitoring/) |\n\u003e [OpenTelemetry Documentation](https://opentelemetry.io/)\n\n## Supported API's\n\n- [OpenAI](https://platform.openai.com/)\n- [Ollama](https://ollama.com/)\n- [Ollama Cloud](https://ollama.com/cloud) (Preview)\n- [Groq](https://console.groq.com/)\n- [Cloudflare](https://www.cloudflare.com/)\n- [Cohere](https://docs.cohere.com/docs/the-cohere-platform)\n- [Anthropic](https://docs.anthropic.com/en/api/getting-started)\n- [DeepSeek](https://api-docs.deepseek.com/)\n- [Google](https://aistudio.google.com/)\n- [Mistral](https://mistral.ai/)\n\n## Configuration\n\nThe Inference Gateway can be configured using environment variables. The\nfollowing [environment variables](./Configurations.md) are supported.\n\n### Vision/Multimodal Support\n\nTo enable vision capabilities for processing images alongside text:\n\n```bash\nENABLE_VISION=true\n```\n\n**Supported Providers with Vision:**\n\n- OpenAI (GPT-4o, GPT-5, GPT-4.1, GPT-4 Turbo)\n- Anthropic (Claude 3, Claude 4, Claude 4.5 Sonnet, Claude 4.5 Haiku)\n- Google (Gemini 2.5)\n- Cohere (Command A Vision, Aya Vision)\n- Ollama (LLaVA, Llama 4, Llama 3.2 Vision)\n- Groq (vision models)\n- Mistral (Pixtral)\n\n**Note**: Vision support is disabled by default for performance and security\nreasons. When disabled, requests with image content will be rejected even if the\nmodel supports vision.\n\n## Examples\n\n- Using [Docker Compose](examples/docker-compose/)\n  - [Basic setup](examples/docker-compose/basic/) - Simple configuration with a\n    single provider\n  - [MCP Integration](examples/docker-compose/mcp/) - Model Context Protocol with\n    multiple tool servers\n  - [Hybrid deployment](examples/docker-compose/hybrid/) - Multiple providers\n    (cloud + local)\n  - [Authentication](examples/docker-compose/authentication/) - OIDC\n    authentication setup\n  - [Tools](examples/docker-compose/tools/) - Tool integration examples\n- Using [Kubernetes](examples/kubernetes/)\n  - [Basic setup](examples/kubernetes/basic/) - Simple Kubernetes deployment\n  - [MCP Integration](examples/kubernetes/mcp/) - Model Context Protocol in\n    Kubernetes\n  - [Agent deployment](examples/kubernetes/agent/) - Standalone agent deployment\n  - [Hybrid deployment](examples/kubernetes/hybrid/) - Multiple providers in\n    Kubernetes\n  - [Authentication](examples/kubernetes/authentication/) - OIDC authentication\n    in Kubernetes\n  - [Monitoring](examples/kubernetes/monitoring/) - Observability and monitoring\n    setup\n  - [TLS setup](examples/kubernetes/tls/) - TLS/SSL configuration\n- Using standard [REST endpoints](examples/rest-endpoints/)\n\n## SDKs\n\nMore SDKs could be generated using the OpenAPI specification. The following\nSDKs are currently available:\n\n- [Typescript](https://github.com/inference-gateway/typescript-sdk)\n- [Rust](https://github.com/inference-gateway/rust-sdk)\n- [Go](https://github.com/inference-gateway/go-sdk)\n- [Python](https://github.com/inference-gateway/python-sdk)\n\n## CLI Tool\n\nThe Inference Gateway CLI provides a powerful command-line interface for\nmanaging and interacting with the Inference Gateway. It offers tools for\nconfiguration, monitoring, and management of inference services.\n\n### CLI Key Features\n\n- **Status Monitoring**: Check gateway health and resource usage\n- **Interactive Chat**: Chat with models using an interactive interface\n- **Configuration Management**: Manage gateway settings via YAML config\n- **Project Initialization**: Set up local project configurations\n- **Tool Execution**: LLMs can execute whitelisted commands and tools\n\n### CLI Installation\n\n#### Using Go Install\n\n```bash\ngo install github.com/inference-gateway/cli@latest\n```\n\n#### Using CLI Install Script\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash\n```\n\n#### Manual CLI Download\n\nDownload the latest release from the\n[releases page](https://github.com/inference-gateway/cli/releases).\n\n### Quick Start\n\n1. **Initialize project configuration:**\n\n   ```bash\n   infer init\n   ```\n\n2. **Check gateway status:**\n\n   ```bash\n   infer status\n   ```\n\n3. **Start an interactive chat:**\n\n   ```bash\n   infer chat\n   ```\n\nFor more details, see the [CLI documentation](https://github.com/inference-gateway/cli).\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Contributing\n\nFound a bug, missing provider, or have a feature in mind?  \nYou're more than welcome to submit pull requests or open issues for any fixes, improvements, or new ideas!\n\nPlease read the [CONTRIBUTING.md](./CONTRIBUTING.md) for more details.\n\n## Motivation\n\nMy motivation is to build AI Agents without being tied to a single vendor. By\navoiding vendor lock-in and supporting self-hosted LLMs from a single interface,\norganizations gain both portability and data privacy. You can choose to consume\nLLMs from a cloud provider or run them entirely offline with Ollama.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finference-gateway%2Finference-gateway","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finference-gateway%2Finference-gateway","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finference-gateway%2Finference-gateway/lists"}