Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/substratusai/kubeai
Private Open AI on Kubernetes
https://github.com/substratusai/kubeai
ai autoscaler faster-whisper inference-operator k8s kubernetes llm ollama ollama-operator openai-api vllm vllm-operator whisper
Last synced: about 1 month ago
JSON representation
Private Open AI on Kubernetes
- Host: GitHub
- URL: https://github.com/substratusai/kubeai
- Owner: substratusai
- License: apache-2.0
- Created: 2023-10-21T00:59:51.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-30T05:15:58.000Z (about 1 month ago)
- Last Synced: 2024-09-30T05:27:47.777Z (about 1 month ago)
- Topics: ai, autoscaler, faster-whisper, inference-operator, k8s, kubernetes, llm, ollama, ollama-operator, openai-api, vllm, vllm-operator, whisper
- Language: Go
- Homepage: https://www.kubeai.org
- Size: 9.1 MB
- Stars: 343
- Watchers: 9
- Forks: 32
- Open Issues: 41
-
Metadata Files:
- Readme: docs/README.md
- Contributing: docs/contributing/development-environment.md
- License: LICENSE
Awesome Lists containing this project
README
# KubeAI: Private Open AI on Kubernetes
Get inferencing running on Kubernetes: LLMs, Embeddings, Speech-to-Text.
✅️ Drop-in replacement for OpenAI with API compatibility
🧠 Serve top OSS models (LLMs, Whisper, etc.)
🚀 Multi-platform: CPU-only, GPU, coming soon: TPU
⚖️ Scale from zero, autoscale based on load
🛠️ Zero dependencies (does not depend on Istio, Knative, etc.)
💬 Chat UI included ([OpenWebUI](https://github.com/open-webui/open-webui))
🤖 Operates OSS model servers (vLLM, Ollama, FasterWhisper, Infinity)
✉ Stream/batch inference via messaging integrations (Kafka, PubSub, etc.)Quotes from the community:
> reusable, well abstracted solution to run LLMs - [Mike Ensor](https://www.linkedin.com/posts/mikeensor_gcp-solutions-public-retail-edge-available-cluster-traits-activity-7237515920259104769-vBs9?utm_source=share&utm_medium=member_desktop)
## Architecture
KubeAI serves an OpenAI compatible HTTP API. Admins can configure ML models via `kind: Model` Kubernetes Custom Resources. KubeAI can be thought of as a Model Operator (See [Operator Pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/)) that manages [vLLM](https://github.com/vllm-project/vllm) and [Ollama](https://github.com/ollama/ollama) servers.
## Local Quickstart
Create a local cluster using [kind](https://kind.sigs.k8s.io/) or [minikube](https://minikube.sigs.k8s.io/docs/).
TIP: If you are using Podman for kind...
Make sure your Podman machine can use up to 6G of memory (by default it is capped at 2G):```bash
# You might need to stop and remove the existing machine:
podman machine stop
podman machine rm# Init and start a new machine:
podman machine init --memory 6144 --disk-size 120
podman machine start
``````bash
kind create cluster # OR: minikube start
```Add the KubeAI [Helm](https://helm.sh/docs/intro/install/) repository.
```bash
helm repo add kubeai https://www.kubeai.org
helm repo update
```Install KubeAI and wait for all components to be ready (may take a minute).
```bash
helm install kubeai kubeai/kubeai --wait --timeout 10m
```Install some predefined models.
```bash
cat < kubeai-models.yaml
catalog:
gemma2-2b-cpu:
enabled: true
minReplicas: 1
qwen2-500m-cpu:
enabled: true
nomic-embed-text-cpu:
enabled: true
EOFhelm install kubeai-models kubeai/models \
-f ./kubeai-models.yaml
```Before progressing to the next steps, start a watch on Pods in a standalone terminal to see how KubeAI deploys models.
```bash
kubectl get pods --watch
```#### Interact with Gemma2
Because we set `minReplicas: 1` for the Gemma model you should see a model Pod already coming up.
Start a local port-forward to the bundled chat UI.
```bash
kubectl port-forward svc/openwebui 8000:80
```Now open your browser to [localhost:8000](http://localhost:8000) and select the Gemma model to start chatting with.
#### Scale up Qwen2 from Zero
If you go back to the browser and start a chat with Qwen2, you will notice that it will take a while to respond at first. This is because we set `minReplicas: 0` for this model and KubeAI needs to spin up a new Pod (you can verify with `kubectl get models -oyaml qwen2-500m-cpu`).
## Documentation
Checkout our documentation on [kubeai.org](https://www.kubeai.org) to find info on:
* Installing KubeAI in the cloud
* How to guides (e.g. how to manage models and resource profiles).
* Concepts (how the components of KubeAI work).
* How to contribute## Adopters
List of known adopters:
| Name | Description | Link |
| ---- | ----------- | ---- |
| Telescope | Telescope uses KubeAI for multi-region large scale batch LLM inference. | [trytelescope.ai](https://trytelescope.ai) |
| Google Cloud Distributed Edge | KubeAI is included as a reference architecture for inferencing at the edge. | [LinkedIn](https://www.linkedin.com/posts/mikeensor_gcp-solutions-public-retail-edge-available-cluster-traits-activity-7237515920259104769-vBs9?utm_source=share&utm_medium=member_desktop), [GitLab](https://gitlab.com/gcp-solutions-public/retail-edge/available-cluster-traits/kubeai-cluster-trait) |If you are using KubeAI and would like to be listed as an adopter, please make a PR.
## OpenAI API Compatibility
```bash
# Implemented #
/v1/chat/completions
/v1/completions
/v1/embeddings
/v1/models
/v1/audio/transcriptions# Planned #
# /v1/assistants/*
# /v1/batches/*
# /v1/fine_tuning/*
# /v1/images/*
# /v1/vector_stores/*
```## Immediate Roadmap
* Model caching
* LoRA finetuning (compatible with OpenAI finetuning API)
* Image generation (compatible with OpenAI images API)*NOTE:* KubeAI was born out of a project called Lingo which was a simple Kubernetes LLM proxy with basic autoscaling. We relaunched the project as KubeAI (late August 2024) and expanded the roadmap to what it is today.
🌟 Don't forget to drop us a star on GitHub and follow the repo to stay up to date!
[![KubeAI Star history Chart](https://api.star-history.com/svg?repos=substratusai/kubeai&type=Date)](https://star-history.com/#substratusai/kubeai&Date)
## Contact
Let us know about features you are interested in seeing or reach out with questions. [Visit our Discord channel](https://discord.gg/JeXhcmjZVm) to join the discussion!
Or just reach out on LinkedIn if you want to connect:
* [Nick Stogner](https://www.linkedin.com/in/nstogner/)
* [Sam Stoelinga](https://www.linkedin.com/in/samstoelinga/)