https://github.com/mathisve/ollama-on-k8s

Run Ollama on Kubernetes
https://github.com/mathisve/ollama-on-k8s

kubernetes ollama

Last synced: 7 months ago
JSON representation

Run Ollama on Kubernetes

Host: GitHub
URL: https://github.com/mathisve/ollama-on-k8s
Owner: mathisve
Created: 2024-11-07T00:55:35.000Z (8 months ago)
Default Branch: main
Last Pushed: 2024-11-19T03:48:10.000Z (8 months ago)
Last Synced: 2024-11-29T23:31:48.845Z (8 months ago)
Topics: kubernetes, ollama
Homepage: https://youtu.be/QebLDdv13yg
Size: 3.91 KB
Stars: 22
Watchers: 1
Forks: 15
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # ollama-on-k8s

## Docker

Run in docker container

```

docker run -it -p 11434:11434 --name ollama ollama/ollama:latest

docker exec -it ollama ollama run llama3.2

```

## K8s

```

kubectl apply -f manifests

```

## API

Show models

```

curl http://localhost:11434/api/tags

```

Download a model

```

curl http://localhost:11434/api/pull -d '{

  "name": "llama3.2"

}'

```

Generate query

```

curl http://localhost:11434/api/generate -d '{

  "model": "llama3.2",

  "prompt": "In exactly 10 words or less, explain why is the sky blue."

}'

```

Generate query (no streaming)

```

curl http://localhost:11434/api/generate -d '{

  "model": "llama3.2",

  "prompt": "In exactly 10 words or less, explain why is the sky blue.",

  "stream": false

}'

```

# Deploy with GPU

## Create EKS cluster with GPU

This step will take 15-20 minutes.

```

eksctl create cluster -f gpu-setup/cluster-config.yaml

```

## Get GPU operator helm chart

```

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \

    && helm repo update

```

## Install GPU operator helm chart

Installs several components (Device Plugin, GPU Driver, nvidia container toolkit, etc)

```

helm upgrade --install gpu-operator nvidia/gpu-operator \

    --namespace gpu-operator --create-namespace \

    --values gpu-setup/gpu-values.yaml

```

## Deploy Ollama with GPUs

```

kubectl apply -f gpu-manifests/

```

## Deploy time slicing config-map

```

kubectl delete deploy/ollama -n ollama --force

kubectl apply -f gpu-setup/time-slicing-config-map.yaml

kubectl patch clusterpolicies.nvidia.com/cluster-policy \

    -n gpu-operator --type merge \

    -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config", "default": "any"}}}}'

kubectl rollout restart deploy/gpu-operator -n gpu-operator

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mathisve/ollama-on-k8s

Awesome Lists containing this project

README