An open API service indexing awesome lists of open source software.

https://github.com/inference-gateway/inference-gateway

An open-source, high-performance gateway unifying multiple LLM providers, from local solutions like Ollama to major cloud providers such as OpenAI, Groq, Cohere, Anthropic, Cloudflare and DeepSeek.
https://github.com/inference-gateway/inference-gateway

agnostic anthropic api cohere deepseek-v3 gateway gateway-api golang inference-api kubernetes llm openai opensource opensource-projects opentelemetry performance proxy proxy-server self-hosted tracing

Last synced: 3 months ago
JSON representation

An open-source, high-performance gateway unifying multiple LLM providers, from local solutions like Ollama to major cloud providers such as OpenAI, Groq, Cohere, Anthropic, Cloudflare and DeepSeek.

Awesome Lists containing this project

README

        

Inference Gateway




CI Status



Version



License

The Inference Gateway is a proxy server designed to facilitate access to various language model APIs. It allows users to interact with different language models through a unified interface, simplifying the configuration and the process of sending requests and receiving responses from multiple LLMs, enabling an easy use of Mixture of Experts.

- [Key Features](#key-features)
- [Overview](#overview)
- [Supported API's](#supported-apis)
- [Configuration](#configuration)
- [Examples](#examples)
- [SDKs](#sdks)
- [License](#license)
- [Contributing](#contributing)
- [Motivation](#motivation)

## Key Features

- πŸ“œ **Open Source**: Available under the MIT License.
- πŸš€ **Unified API Access**: Proxy requests to multiple language model APIs, including OpenAI, Ollama, Groq, Cohere etc.
- βš™οΈ **Environment Configuration**: Easily configure API keys and URLs through environment variables.
- πŸ”§ **Tool-use Support**: Enable function calling capabilities across supported providers with a unified API.
- 🌊 **Streaming Responses**: Stream tokens in real-time as they're generated from language models.
- 🐳 **Docker Support**: Use Docker and Docker Compose for easy setup and deployment.
- ☸️ **Kubernetes Support**: Ready for deployment in Kubernetes environments.
- πŸ“Š **OpenTelemetry**: Monitor and analyze performance.
- πŸ›‘οΈ **Production Ready**: Built with production in mind, with configurable timeouts and TLS support.
- 🌿 **Lightweight**: Includes only essential libraries and runtime, resulting in smaller size binary of ~10.8MB.
- πŸ“‰ **Minimal Resource Consumption**: Designed to consume minimal resources and have a lower footprint.
- πŸ“š **Documentation**: Well documented with examples and guides.
- πŸ§ͺ **Tested**: Extensively tested with unit tests and integration tests.
- πŸ› οΈ **Maintained**: Actively maintained and developed.
- πŸ“ˆ **Scalable**: Easily scalable and can be used in a distributed environment - with HPA in Kubernetes.
- πŸ”’ **Compliance** and Data Privacy: This project does not collect data or analytics, ensuring compliance and data privacy.
- 🏠 **Self-Hosted**: Can be self-hosted for complete control over the deployment environment.

## Overview

You can horizontally scale the Inference Gateway to handle multiple requests from clients. The Inference Gateway will forward the requests to the respective provider and return the response to the client. The following diagram illustrates the flow:

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#326CE5', 'primaryTextColor': '#fff', 'lineColor': '#5D8AA8', 'secondaryColor': '#006100' }, 'fontFamily': 'Arial', 'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'padding': 15}}}%%

graph TD
%% Client nodes
A["πŸ‘₯ Clients / πŸ€– Agents"] --> |POST /v1/chat/completions| Auth

%% Auth node
Auth["πŸ”’ Optional OIDC"] --> |Auth?| IG1
Auth --> |Auth?| IG2
Auth --> |Auth?| IG3

%% Gateway nodes
IG1["πŸ–₯️ Inference Gateway"] --> P
IG2["πŸ–₯️ Inference Gateway"] --> P
IG3["πŸ–₯️ Inference Gateway"] --> P

%% Proxy and providers
P["πŸ”Œ Proxy Gateway"] --> C["πŸ¦™ Ollama"]
P --> D["πŸš€ Groq"]
P --> E["☁️ OpenAI"]
P --> G["⚑ Cloudflare"]
P --> H1["πŸ’¬ Cohere"]
P --> H2["🧠 Anthropic"]
P --> H3["πŸ‹ DeepSeek"]

%% Define styles
classDef client fill:#9370DB,stroke:#333,stroke-width:1px,color:white;
classDef auth fill:#F5A800,stroke:#333,stroke-width:1px,color:black;
classDef gateway fill:#326CE5,stroke:#fff,stroke-width:1px,color:white;
classDef provider fill:#32CD32,stroke:#333,stroke-width:1px,color:white;

%% Apply styles
class A client;
class Auth auth;
class IG1,IG2,IG3,P gateway;
class C,D,E,G,H1,H2,H3 provider;
```

Client is sending:

```bash
curl -X POST http://localhost:8080/v1/chat/completions
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a pirate."
},
{
"role": "user",
"content": "Hello, world! How are you doing today?"
}
],
}'
```

\*\* Internally the request is proxied to OpenAI, the Inference Gateway inferring the provider by the model name.

You can also send the request explicitly using `?provider=openai` or any other supported provider in the URL.

Finally client receives:

```json
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Ahoy, matey! πŸ΄β€β˜ οΈ The seas be wild, the sun be bright, and this here pirate be ready to conquer the day! What be yer business, landlubber? 🦜",
"role": "assistant"
}
}
],
"created": 1741821109,
"id": "chatcmpl-dc24995a-7a6e-4d95-9ab3-279ed82080bb",
"model": "N/A",
"object": "chat.completion",
"usage": {
"completion_tokens": 0,
"prompt_tokens": 0,
"total_tokens": 0
}
}
```

For streaming the tokens simply add to the request body `stream: true`.

## Supported API's

- [OpenAI](https://platform.openai.com/)
- [Ollama](https://ollama.com/)
- [Groq](https://console.groq.com/)
- [Cloudflare](https://www.cloudflare.com/)
- [Cohere](https://docs.cohere.com/docs/the-cohere-platform)
- [Anthropic](https://docs.anthropic.com/en/api/getting-started)
- [DeepSeek](https://api-docs.deepseek.com/)

## Configuration

The Inference Gateway can be configured using environment variables. The following [environment variables](./Configurations.md) are supported.

## Examples

- Using [Docker Compose](examples/docker-compose/)
- Using [Kubernetes](examples/kubernetes/)
- Using standard [REST endpoints](examples/rest-endpoints/)

## SDKs

More SDKs could be generated using the OpenAPI specification. The following SDKs are currently available:

- [Typescript](https://github.com/inference-gateway/typescript-sdk)
- [Rust](https://github.com/inference-gateway/rust-sdk)
- [Go](https://github.com/inference-gateway/go-sdk)
- [Python](https://github.com/inference-gateway/python-sdk)

## License

This project is licensed under the MIT License.

## Contributing

Found a bug, missing provider, or have a feature in mind?
You're more than welcome to submit pull requests or open issues for any fixes, improvements, or new ideas!

Please read the [CONTRIBUTING.md](./CONTRIBUTING.md) for more details.

## Motivation

My motivation is to build AI Agents without being tied to a single vendor. By avoiding vendor lock-in and supporting self-hosted LLMs from a single interface, organizations gain both portability and data privacy. You can choose to consume LLMs from a cloud provider or run them entirely offline with Ollama.