Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/inference-gateway/inference-gateway

An open-source gateway unifying multiple LLM providers, from local solutions like Ollama to major Cloud Providers such as OpenAI, Groq Cloud, Cohere, Google, and Cloudflare.
https://github.com/inference-gateway/inference-gateway

agnostic api gateway gateway-api golang inference-api infrastructure infrastructure-configuration kubernetes llm llms opensource opensource-projects opentelemetry performance proxy proxy-server self-hosted tracing

Last synced: about 9 hours ago
JSON representation

An open-source gateway unifying multiple LLM providers, from local solutions like Ollama to major Cloud Providers such as OpenAI, Groq Cloud, Cohere, Google, and Cloudflare.

Awesome Lists containing this project

README

        

Inference Gateway




CI Status



Version



License

The Inference Gateway is a proxy server designed to facilitate access to various language model APIs. It allows users to interact with different language models through a unified interface, simplifying the configuration and the process of sending requests and receiving responses from multiple LLMs, enabling an easy use of Mixture of Experts.

- [Key Features](#key-features)
- [Overview](#overview)
- [Supported API's](#supported-apis)
- [Configuration](#configuration)
- [Examples](#examples)
- [SDKs](#sdks)
- [License](#license)
- [Contributing](#contributing)
- [Motivation](#motivation)

## Key Features

- πŸ“œ **Open Source**: Available under the MIT License.
- πŸš€ **Unified API Access**: Proxy requests to multiple language model APIs, including OpenAI, Ollama, Groq, Cohere etc.
- βš™οΈ **Environment Configuration**: Easily configure API keys and URLs through environment variables.
- 🐳 **Docker Support**: Use Docker and Docker Compose for easy setup and deployment.
- ☸️ **Kubernetes Support**: Ready for deployment in Kubernetes environments.
- πŸ“Š **OpenTelemetry Tracing**: Enable tracing for the server to monitor and analyze performance.
- πŸ›‘οΈ **Production Ready**: Built with production in mind, with configurable timeouts and TLS support.
- 🌿 **Lightweight**: Includes only essential libraries and runtime, resulting in smaller size binary of ~10.8MB.
- πŸ“‰ **Minimal Resource Consumption**: Designed to consume minimal resources and have a lower footprint.
- πŸ“š **Documentation**: Well documented with examples and guides.
- πŸ§ͺ **Tested**: Extensively tested with unit tests and integration tests.
- πŸ› οΈ **Maintained**: Actively maintained and developed.
- πŸ“ˆ **Scalable**: Easily scalable and can be used in a distributed environment - with HPA in Kubernetes.
- πŸ”’ **Compliance** and Data Privacy: This project does not collect data or analytics, ensuring compliance and data privacy.
- 🏠 **Self-Hosted**: Can be self-hosted for complete control over the deployment environment.

## Overview

You can horizontally scale the Inference Gateway to handle multiple requests from clients. The Inference Gateway will forward the requests to the respective provider and return the response to the client. The following diagram illustrates the flow:

```mermaid
graph TD
A[Client] --> |POST /llms/provider/generate| Auth[Inference Gateway]
A[Client] --> |POST /llms/provider/generate| Auth[Inference Gateway]
A[Client] --> |POST /llms/provider/generate| Auth[Inference Gateway]
Auth[Optional OIDC] --> |Auth?| IG1[Inference Gateway]
Auth[Optional OIDC] --> |Auth?| IG2[Inference Gateway]
Auth[Optional OIDC] --> |Auth?| IG3[Inference Gateway]
IG1 --> P[Proxy Gateway]
IG2 --> P[Proxy Gateway]
IG3 --> P[Proxy Gateway]
P[Proxy Gateway] --> C[Ollama API]
P[Proxy Gateway] --> D[Groq API]
P[Proxy Gateway] --> E[OpenAI API]
P[Proxy Gateway] --> F[Google API]
P[Proxy Gateway] --> G[Cloudflare API]
P[Proxy Gateway] --> H[Cohere API]
```

Client is sending:

```bash
curl -X POST http://localhost:8080/llms/openai/generate
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a pirate."
},
{
"role": "user",
"content": "Hello, world! How are you doing today?"
}
],
}'
```

Client receives:

```json
{
"provider": "openai",
"response": {
"role": "assistant",
"model": "gpt-3.5-turbo",
"content": "Ahoy, matey! πŸ΄β€β˜ οΈ The seas be wild, the sun be bright, and this here pirate be ready to conquer the day! What be yer business, landlubber? 🦜"
}
}
```

## Supported API's

- [OpenAI](https://platform.openai.com/)
- [Ollama](https://ollama.com/)
- [Groq](https://console.groq.com/)
- [Google](https://aistudio.google.com/)
- [Cloudflare](https://www.cloudflare.com/)
- [Cohere](https://docs.cohere.com/docs/the-cohere-platform)

## Configuration

The Inference Gateway can be configured using environment variables. The following [environment variables](./Configurations.md) are supported.

## Examples

- Using [Docker Compose](examples/docker-compose/)
- Using [Kubernetes](examples/kubernetes/)
- Using standard [REST endpoints](examples/rest-endpoints/)

## SDKs

More SDKs could be generated using the OpenAPI specification. The following SDKs are currently available:

- [Rust](https://github.com/inference-gateway/rust-sdk)
- [Go](https://github.com/inference-gateway/go-sdk)
- [Python](https://github.com/inference-gateway/python-sdk)

## License

This project is licensed under the MIT License.

## Contributing

Found a bug, missing provider, or have a feature in mind?
You're more than welcome to submit pull requests or open issues for any fixes, improvements, or new ideas!

## Motivation

My motivation is to build AI Agents without being tied to a single vendor. By avoiding vendor lock-in and supporting self-hosted LLMs from a single interface, organizations gain both portability and data privacy. You can choose to consume LLMs from a cloud provider or run them entirely offline with Ollama.