Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/adrianliechti/llama
Inference Hub for AI at Scale
https://github.com/adrianliechti/llama
ai-gateway azure-openai claude function-calling groq llama llamacpp llm mistral ollama openai openai-api rag
Last synced: 2 months ago
JSON representation
Inference Hub for AI at Scale
- Host: GitHub
- URL: https://github.com/adrianliechti/llama
- Owner: adrianliechti
- License: mit
- Created: 2023-09-07T15:08:49.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-09T20:07:49.000Z (2 months ago)
- Last Synced: 2024-10-10T06:32:41.421Z (2 months ago)
- Topics: ai-gateway, azure-openai, claude, function-calling, groq, llama, llamacpp, llm, mistral, ollama, openai, openai-api, rag
- Language: Go
- Homepage:
- Size: 1.53 MB
- Stars: 33
- Watchers: 3
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LLM Platform
Open Source LLM Platform to build and deploy applications at scale
![Logo](docs/icon.png)
## Architecture
![Architecture](docs/architecture.png)
## Integrations & Configuration
### LLM Providers
#### OpenAI Platform
https://platform.openai.com/docs/api-reference
```yaml
providers:
- type: openai
token: sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxmodels:
- gpt-4o
- gpt-4o-mini
- text-embedding-3-small
- text-embedding-3-large
- whisper-1
- dall-e-3
- tts-1
- tts-1-hd
```#### Azure OpenAI Service
https://azure.microsoft.com/en-us/products/ai-services/openai-service
```yaml
providers:
- type: openai
url: https://xxxxxxxx.openai.azure.com
token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxmodels:
# https://docs.anthropic.com/en/docs/models-overview
#
# {alias}:
# - id: {azure oai deployment name}gpt-3.5-turbo:
id: gpt-35-turbo-16kgpt-4:
id: gpt-4-32k
text-embedding-ada-002:
id: text-embedding-ada-002
```#### Anthropic
https://www.anthropic.com/api
```yaml
providers:
- type: anthropic
token: sk-ant-apixx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxmodels:
# https://docs.anthropic.com/en/docs/models-overview
#
# {alias}:
# - id: {anthropic api model name}claude-3-opus:
id: claude-3-opus-20240229
```#### Cohere
```yaml
providers:
- type: cohere
token: ${COHERE_API_KEY}# https://docs.cohere.com/docs/models
models:
cohere-command-r-plus:
id: command-r-plus
cohere-embed-multilingual-v3:
id: embed-multilingual-v3.0
```#### Groq
```yaml
providers:
- type: groq
token: ${GROQ_API_KEY}# https://console.groq.com/docs/models
models:
groq-llama-3-8b:
id: llama3-8b-8192groq-whisper-1:
id: whisper-large-v3
```#### Mistral AI
```yaml
providers:
- type: mistral
token: ${MISTRAL_API_KEY}# https://docs.mistral.ai/getting-started/models/
models:
mistral-large:
id: mistral-large-latest
```#### Replicate
https://replicate.com/
```yaml
providers:
- type: replicate
token: ${REPLICATE_API_KEY}
models:
replicate-flux-pro:
id: black-forest-labs/flux-pro
```#### Ollama
https://ollama.ai
```shell
$ ollama start
$ ollama run mistral
``````yaml
providers:
- type: ollama
url: http://localhost:11434models:
# https://ollama.com/library
#
# {alias}:
# - id: {ollama model name with optional version}mistral-7b-instruct:
id: mistral:latest
```#### LLAMA.CPP
https://github.com/ggerganov/llama.cpp/tree/master/examples/server
```shell
# using taskfile.dev
$ task llama:server# LLAMA.CPP Server
$ llama-server --port 9081 --log-disable --model ./models/mistral-7b-instruct-v0.2.Q4_K_M.gguf# LLAMA.CPP Server (Multimodal Model)
$ llama-server --port 9081 --log-disable --model ./models/llava-v1.5-7b-Q4_K.gguf --mmproj ./models/llava-v1.5-7b-mmproj-Q4_0.gguf# using Docker (might be slow)
$ docker run -it --rm -p 9081:9081 -v ./models/:/models/ ghcr.io/ggerganov/llama.cpp:server --host 0.0.0.0 --port 9081 --model /models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
``````yaml
providers:
- type: llama
url: http://localhost:9081models:
- mistral-7b-instruct
```#### Mistral.RS
https://github.com/EricLBuehler/mistral.rs
```shell
$ mistralrs-server --port 1234 --isq Q4K plain -m meta-llama/Meta-Llama-3.1-8B-Instruct -a llama
``````yaml
providers:
- type: mistralrs
url: http://localhost:1234models:
mistralrs-llama-3.1-8b:
id: llama
```#### WHISPER.CPP
https://github.com/ggerganov/whisper.cpp/tree/master/examples/server
```shell
# using taskfile.dev
$ task whisper:server# WHISPER.CPP Server
$ whisper-server --port 9083 --convert --model ./models/whisper-ggml-medium.bin
``````yaml
providers:
- type: whisper
url: http://localhost:9083models:
- whisper
```#### Hugging Face
https://huggingface.co/
```yaml
providers:
- type: huggingface
token: hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
models:
mistral-7B-instruct:
id: mistralai/Mistral-7B-Instruct-v0.1
huggingface-minilm-l6-2:
id: sentence-transformers/all-MiniLM-L6-v2
```#### Eleven Labs
```yaml
providers:
- type: elevenlabs
token: ${ELEVENLABS_API_KEY}models:
elevenlabs-sarah:
id: EXAVITQu4vr4xnSDxMaL
elevenlabs-charlie:
id: IKne3meq5aSn9XLyUdCD
```#### LangChain / LangServe
https://python.langchain.com/docs/langserve
```yaml
providers:
- type: langchain
url: http://your-langchain-server:8000models:
- langchain
```### Routers
#### Round-robin Load Balancer
```yaml
routers:
llama-lb:
type: roundrobin
models:
- llama-3-8b
- groq-llama-3-8b
- huggingface-llama-3-8b
```### Vector Databses / Indexes
#### Chroma
https://www.trychroma.com
```shell
# using Docker
$ docker run -it --rm -p 9083:8000 -v chroma-data:/chroma/chroma ghcr.io/chroma-core/chroma
``````yaml
indexes:
docs:
type: chroma
url: http://localhost:9083
namespace: docs
embedder: text-embedding-ada-002
```#### Weaviate
https://weaviate.io
```shell
# using Docker
$ docker run -it --rm -p 9084:8080 -v weaviate-data:/var/lib/weaviate -e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true -e PERSISTENCE_DATA_PATH=/var/lib/weaviate semitechnologies/weaviate
``````yaml
indexes:
docs:
type: weaviate
url: http://localhost:9084
namespace: Document
embedder: text-embedding-ada-002
```#### Qdrant
```shell
$ docker run -p 6333:6333 qdrant/qdrant:v1.11.4
``````yaml
indexes:
docs:
type: qdrant
url: http://localhost:6333
namespace: docs
embedder: text-embedding-ada-002
```#### In-Memory
```yaml
indexes:
docs:
type: memory
embedder: text-embedding-ada-002
```#### OpenSearch / Elasticsearch
```shell
# using Docker
docker run -it --rm -p 9200:9200 -v opensearch-data:/usr/share/opensearch/data -e "discovery.type=single-node" -e DISABLE_SECURITY_PLUGIN=true opensearchproject/opensearch:latest
``````yaml
indexes:
docs:
type: elasticsearch
url: http://localhost:9200
namespace: docs
```### Extractor
#### Tika
```shell
# using Docker
docker run -it --rm -p 9998:9998 apache/tika:3.0.0.0-BETA2-full
``````yaml
extractors:
tika:
type: tika
url: http://localhost:9998
chunkSize: 4000
chunkOverlap: 200
```#### Unstructured
https://unstructured.io
```shell
# using Docker
docker run -it --rm -p 9085:8000 quay.io/unstructured-io/unstructured-api:0.0.75 --port 8000 --host 0.0.0.0
``````yaml
extractors:
unstructured:
type: unstructured
url: http://localhost:9085
```