Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/BerriAI/liteLLM-proxy

Last synced: about 2 months ago
JSON representation
Host: GitHub
URL: https://github.com/BerriAI/liteLLM-proxy
Owner: BerriAI
License: mit
Created: 2023-08-11T23:06:21.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-02-10T22:46:51.000Z (7 months ago)
Last Synced: 2024-06-28T10:37:22.059Z (3 months ago)
Language: Python
Size: 104 KB
Stars: 110
Watchers: 4
Forks: 22
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

        # liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching

⚠️ DEPRECATION WARNING: [LiteLLM](https://github.com/BerriAI/litellm) is our new home. You can find the LiteLLM Proxy there. Thank you for checking us out! ❤️

### Azure, Llama2, OpenAI, Claude, Hugging Face, Replicate Models

[![PyPI Version](https://img.shields.io/pypi/v/litellm.svg)](https://pypi.org/project/litellm/)

[![PyPI Version](https://img.shields.io/badge/stable%20version-v0.1.345-blue?color=green&link=https://pypi.org/project/litellm/0.1.1/)](https://pypi.org/project/litellm/0.1.1/)

![Downloads](https://img.shields.io/pypi/dm/litellm)

[![litellm](https://img.shields.io/badge/%20%F0%9F%9A%85%20liteLLM-OpenAI%7CAzure%7CAnthropic%7CPalm%7CCohere%7CReplicate%7CHugging%20Face-blue?color=green)](https://github.com/BerriAI/litellm)

[![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/template/DYqQAW?referralCode=t3ukrU)

![4BC6491E-86D0-4833-B061-9F54524B2579](https://github.com/BerriAI/litellm/assets/17561003/f5dd237b-db5e-42e1-b1ac-f05683b1d724)

## Usage 

**Step 1: Put your API keys in .env** 

Copy the .env.template and put in the relevant keys (e.g. OPENAI_API_KEY="sk-..")

**Step 2: Test your proxy**

Start your proxy server

```shell

$ cd litellm-proxy && python3 main.py 

```

Make your first call 

```python

import openai 

openai.api_key = "sk-litellm-master-key"

openai.api_base = "http://0.0.0.0:8080"

response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey"}])

print(response)

```

## What does liteLLM proxy do

- Make `/chat/completions` requests for 50+ LLM models **Azure, OpenAI, Replicate, Anthropic, Hugging Face**

  Example: for `model` use `claude-2`, `gpt-3.5`, `gpt-4`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`

  ```json

  {

    "model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",

    "messages": [

      {

        "content": "Hello, whats the weather in San Francisco??",

        "role": "user"

      }

    ]

  }

  ```

- **Consistent Input/Output** Format

  - Call all models using the OpenAI format - `completion(model, messages)`

  - Text responses will always be available at `['choices'][0]['message']['content']`

- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`)

- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `LLMonitor`, `Traceloop`, `Helicone` (Any of the supported providers here: https://docs.litellm.ai/docs/

  **Example: Logs sent to Supabase**

  

- **Token Usage & Spend** - Track Input + Completion tokens used + Spend/model

- **Caching** - Implementation of Semantic Caching

- **Streaming & Async Support** - Return generators to stream text responses

## API Endpoints

### `/chat/completions` (POST)

This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc

#### Input

This API endpoint accepts all inputs in raw JSON and expects the following inputs

- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://docs.litellm.ai/docs/):

  eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`

- `messages` (array, required): A list of messages representing the conversation context. Each message should have a `role` (system, user, assistant, or function), `content` (message text), and `name` (for function role).

- Additional Optional parameters: `temperature`, `functions`, `function_call`, `top_p`, `n`, `stream`. See the full list of supported inputs here: https://docs.litellm.ai/docs/

#### Example JSON body

For claude-2

```json

{

  "model": "claude-2",

  "messages": [

    {

      "content": "Hello, whats the weather in San Francisco??",

      "role": "user"

    }

  ]

}

```

### Making an API request to the Proxy Server

```python

import requests

import json

# TODO: use your URL

url = "http://localhost:5000/chat/completions"

payload = json.dumps({

  "model": "gpt-3.5-turbo",

  "messages": [

    {

      "content": "Hello, whats the weather in San Francisco??",

      "role": "user"

    }

  ]

})

headers = {

  'Content-Type': 'application/json'

}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

```

### Output [Response Format]

Responses from the server are given in the following format.

All responses from the server are returned in the following format (for all LLM models). More info on output here: https://docs.litellm.ai/docs/

```json

{

  "choices": [

    {

      "finish_reason": "stop",

      "index": 0,

      "message": {

        "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",

        "role": "assistant"

      }

    }

  ],

  "created": 1691790381,

  "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",

  "model": "gpt-3.5-turbo-0613",

  "object": "chat.completion",

  "usage": {

    "completion_tokens": 41,

    "prompt_tokens": 16,

    "total_tokens": 57

  }

}

```

## Installation & Usage

### Running Locally

1. Clone liteLLM repository to your local machine:

   ```

   git clone https://github.com/BerriAI/liteLLM-proxy

   ```

2. Install the required dependencies using pip

   ```

   pip install requirements.txt

   ```

3. (optional)Set your LiteLLM proxy master key

   ```

   os.environ['LITELLM_PROXY_MASTER_KEY]` = "YOUR_LITELLM_PROXY_MASTER_KEY"

   or

   set LITELLM_PROXY_MASTER_KEY in your .env file

   ```

4. Set your LLM API keys

   ```

   os.environ['OPENAI_API_KEY]` = "YOUR_API_KEY"

   or

   set OPENAI_API_KEY in your .env file

   ```

5. Run the server:

   ```

   python main.py

   ```

## Deploying

1. Quick Start: Deploy on Railway

   [![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/template/DYqQAW?referralCode=t3ukrU)

2. `GCP`, `AWS`, `Azure`

   This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers

# Support / Talk with founders

- [Our calendar 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)

- [Community Discord 💭](https://discord.gg/wuPM9dRgDw)

- Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238

- Our emails ✉️ [email protected] / [email protected]

## Roadmap

- [ ] Support hosted db (e.g. Supabase)

- [ ] Easily send data to places like posthog and sentry.

- [ ] Add a hot-cache for project spend logs - enables fast checks for user + project limitings

- [ ] Implement user-based rate-limiting

- [ ] Spending controls per project - expose key creation endpoint

- [ ] Need to store a keys db -> mapping created keys to their alias (i.e. project name)

- [ ] Easily add new models as backups / as the entry-point (add this to the available model list)