https://github.com/inference-gateway/inference-gateway

An open-source, high-performance gateway unifying multiple LLM providers, from local solutions like Ollama to major cloud providers such as OpenAI, Groq, Cohere, Anthropic, Cloudflare and DeepSeek.
https://github.com/inference-gateway/inference-gateway

agnostic anthropic api cohere deepseek-v3 gateway gateway-api golang inference-api kubernetes llm openai opensource opensource-projects opentelemetry performance proxy proxy-server self-hosted tracing

Last synced: 3 months ago
JSON representation

An open-source, high-performance gateway unifying multiple LLM providers, from local solutions like Ollama to major cloud providers such as OpenAI, Groq, Cohere, Anthropic, Cloudflare and DeepSeek.

Host: GitHub
URL: https://github.com/inference-gateway/inference-gateway
Owner: inference-gateway
License: mit
Created: 2025-01-08T23:27:25.000Z (6 months ago)
Default Branch: main
Last Pushed: 2025-04-12T15:41:15.000Z (3 months ago)
Last Synced: 2025-04-14T04:14:04.870Z (3 months ago)
Topics: agnostic, anthropic, api, cohere, deepseek-v3, gateway, gateway-api, golang, inference-api, kubernetes, llm, openai, opensource, opensource-projects, opentelemetry, performance, proxy, proxy-server, self-hosted, tracing
Language: Go
Homepage: https://docs.inference-gateway.com
Size: 739 KB
Stars: 17
Watchers: 1
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

        
Inference Gateway




  

  

    

  

  

  

    

  

  

  

    

  



The Inference Gateway is a proxy server designed to facilitate access to various language model APIs. It allows users to interact with different language models through a unified interface, simplifying the configuration and the process of sending requests and receiving responses from multiple LLMs, enabling an easy use of Mixture of Experts.

- [Key Features](#key-features)

- [Overview](#overview)

- [Supported API's](#supported-apis)

- [Configuration](#configuration)

- [Examples](#examples)

- [SDKs](#sdks)

- [License](#license)

- [Contributing](#contributing)

- [Motivation](#motivation)

## Key Features

- 📜 **Open Source**: Available under the MIT License.

- 🚀 **Unified API Access**: Proxy requests to multiple language model APIs, including OpenAI, Ollama, Groq, Cohere etc.

- ⚙️ **Environment Configuration**: Easily configure API keys and URLs through environment variables.

- 🔧 **Tool-use Support**: Enable function calling capabilities across supported providers with a unified API.

- 🌊 **Streaming Responses**: Stream tokens in real-time as they're generated from language models.

- 🐳 **Docker Support**: Use Docker and Docker Compose for easy setup and deployment.

- ☸️ **Kubernetes Support**: Ready for deployment in Kubernetes environments.

- 📊 **OpenTelemetry**: Monitor and analyze performance.

- 🛡️ **Production Ready**: Built with production in mind, with configurable timeouts and TLS support.

- 🌿 **Lightweight**: Includes only essential libraries and runtime, resulting in smaller size binary of ~10.8MB.

- 📉 **Minimal Resource Consumption**: Designed to consume minimal resources and have a lower footprint.

- 📚 **Documentation**: Well documented with examples and guides.

- 🧪 **Tested**: Extensively tested with unit tests and integration tests.

- 🛠️ **Maintained**: Actively maintained and developed.

- 📈 **Scalable**: Easily scalable and can be used in a distributed environment - with HPA in Kubernetes.

- 🔒 **Compliance** and Data Privacy: This project does not collect data or analytics, ensuring compliance and data privacy.

- 🏠 **Self-Hosted**: Can be self-hosted for complete control over the deployment environment.

## Overview

You can horizontally scale the Inference Gateway to handle multiple requests from clients. The Inference Gateway will forward the requests to the respective provider and return the response to the client. The following diagram illustrates the flow:

```mermaid

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#326CE5', 'primaryTextColor': '#fff', 'lineColor': '#5D8AA8', 'secondaryColor': '#006100' }, 'fontFamily': 'Arial', 'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'padding': 15}}}%%

graph TD

    %% Client nodes

    A["👥 Clients / 🤖 Agents"] --> |POST /v1/chat/completions| Auth

    %% Auth node

    Auth["🔒 Optional OIDC"] --> |Auth?| IG1

    Auth --> |Auth?| IG2

    Auth --> |Auth?| IG3

    %% Gateway nodes

    IG1["🖥️ Inference Gateway"] --> P

    IG2["🖥️ Inference Gateway"] --> P

    IG3["🖥️ Inference Gateway"] --> P

    %% Proxy and providers

    P["🔌 Proxy Gateway"] --> C["🦙 Ollama"]

    P --> D["🚀 Groq"]

    P --> E["☁️ OpenAI"]

    P --> G["⚡ Cloudflare"]

    P --> H1["💬 Cohere"]

    P --> H2["🧠 Anthropic"]

    P --> H3["🐋 DeepSeek"]

    %% Define styles

    classDef client fill:#9370DB,stroke:#333,stroke-width:1px,color:white;

    classDef auth fill:#F5A800,stroke:#333,stroke-width:1px,color:black;

    classDef gateway fill:#326CE5,stroke:#fff,stroke-width:1px,color:white;

    classDef provider fill:#32CD32,stroke:#333,stroke-width:1px,color:white;

    %% Apply styles

    class A client;

    class Auth auth;

    class IG1,IG2,IG3,P gateway;

    class C,D,E,G,H1,H2,H3 provider;

```

Client is sending:

```bash

curl -X POST http://localhost:8080/v1/chat/completions

  -d '{

    "model": "gpt-3.5-turbo",

    "messages": [

      {

        "role": "system",

        "content": "You are a pirate."

      },

      {

        "role": "user",

        "content": "Hello, world! How are you doing today?"

      }

    ],

  }'

```

\*\* Internally the request is proxied to OpenAI, the Inference Gateway inferring the provider by the model name.

You can also send the request explicitly using `?provider=openai` or any other supported provider in the URL.

Finally client receives:

```json

{

  "choices": [

    {

      "finish_reason": "stop",

      "index": 0,

      "message": {

        "content": "Ahoy, matey! 🏴‍☠️ The seas be wild, the sun be bright, and this here pirate be ready to conquer the day! What be yer business, landlubber? 🦜",

        "role": "assistant"

      }

    }

  ],

  "created": 1741821109,

  "id": "chatcmpl-dc24995a-7a6e-4d95-9ab3-279ed82080bb",

  "model": "N/A",

  "object": "chat.completion",

  "usage": {

    "completion_tokens": 0,

    "prompt_tokens": 0,

    "total_tokens": 0

  }

}

```

For streaming the tokens simply add to the request body `stream: true`.

## Supported API's

- [OpenAI](https://platform.openai.com/)

- [Ollama](https://ollama.com/)

- [Groq](https://console.groq.com/)

- [Cloudflare](https://www.cloudflare.com/)

- [Cohere](https://docs.cohere.com/docs/the-cohere-platform)

- [Anthropic](https://docs.anthropic.com/en/api/getting-started)

- [DeepSeek](https://api-docs.deepseek.com/)

## Configuration

The Inference Gateway can be configured using environment variables. The following [environment variables](./Configurations.md) are supported.

## Examples

- Using [Docker Compose](examples/docker-compose/)

- Using [Kubernetes](examples/kubernetes/)

- Using standard [REST endpoints](examples/rest-endpoints/)

## SDKs

More SDKs could be generated using the OpenAPI specification. The following SDKs are currently available:

- [Typescript](https://github.com/inference-gateway/typescript-sdk)

- [Rust](https://github.com/inference-gateway/rust-sdk)

- [Go](https://github.com/inference-gateway/go-sdk)

- [Python](https://github.com/inference-gateway/python-sdk)

## License

This project is licensed under the MIT License.

## Contributing

Found a bug, missing provider, or have a feature in mind?  

You're more than welcome to submit pull requests or open issues for any fixes, improvements, or new ideas!

Please read the [CONTRIBUTING.md](./CONTRIBUTING.md) for more details.

## Motivation

My motivation is to build AI Agents without being tied to a single vendor. By avoiding vendor lock-in and supporting self-hosted LLMs from a single interface, organizations gain both portability and data privacy. You can choose to consume LLMs from a cloud provider or run them entirely offline with Ollama.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/inference-gateway/inference-gateway

Awesome Lists containing this project

README

Inference Gateway