Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mathisve/ollama-on-k8s
Run Ollama on Kubernetes
https://github.com/mathisve/ollama-on-k8s
kubernetes ollama
Last synced: 25 days ago
JSON representation
Run Ollama on Kubernetes
- Host: GitHub
- URL: https://github.com/mathisve/ollama-on-k8s
- Owner: mathisve
- Created: 2024-11-07T00:55:35.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-19T03:48:10.000Z (about 1 month ago)
- Last Synced: 2024-11-29T23:31:48.845Z (29 days ago)
- Topics: kubernetes, ollama
- Homepage: https://youtu.be/QebLDdv13yg
- Size: 3.91 KB
- Stars: 22
- Watchers: 1
- Forks: 15
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ollama-on-k8s
## Docker
Run in docker container
```
docker run -it -p 11434:11434 --name ollama ollama/ollama:latest
docker exec -it ollama ollama run llama3.2
```## K8s
```
kubectl apply -f manifests
```## API
Show models
```
curl http://localhost:11434/api/tags
```Download a model
```
curl http://localhost:11434/api/pull -d '{
"name": "llama3.2"
}'
```Generate query
```
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "In exactly 10 words or less, explain why is the sky blue."
}'
```Generate query (no streaming)
```
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "In exactly 10 words or less, explain why is the sky blue.",
"stream": false
}'
```# Deploy with GPU
## Create EKS cluster with GPU
This step will take 15-20 minutes.
```
eksctl create cluster -f gpu-setup/cluster-config.yaml
```## Get GPU operator helm chart
```
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
&& helm repo update
```## Install GPU operator helm chart
Installs several components (Device Plugin, GPU Driver, nvidia container toolkit, etc)
```
helm upgrade --install gpu-operator nvidia/gpu-operator \
--namespace gpu-operator --create-namespace \
--values gpu-setup/gpu-values.yaml
```## Deploy Ollama with GPUs
```
kubectl apply -f gpu-manifests/
```## Deploy time slicing config-map
```
kubectl delete deploy/ollama -n ollama --forcekubectl apply -f gpu-setup/time-slicing-config-map.yaml
kubectl patch clusterpolicies.nvidia.com/cluster-policy \
-n gpu-operator --type merge \
-p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config", "default": "any"}}}}'kubectl rollout restart deploy/gpu-operator -n gpu-operator
```