https://github.com/otwld/ollama-helm

Helm chart for Ollama on Kubernetes
https://github.com/otwld/ollama-helm

helm kubernetes ollama

Last synced: 6 months ago
JSON representation

Helm chart for Ollama on Kubernetes

Host: GitHub
URL: https://github.com/otwld/ollama-helm
Owner: otwld
License: mit
Created: 2023-11-18T14:23:18.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-03-29T16:55:16.000Z (6 months ago)
Last Synced: 2025-04-01T04:52:42.075Z (6 months ago)
Topics: helm, kubernetes, ollama
Language: Smarty
Homepage: https://artifacthub.io/packages/helm/ollama-helm/ollama
Size: 518 KB
Stars: 402
Watchers: 7
Forks: 61
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

![otwld ollama helm chart banner](./banner.png)

![GitHub License](https://img.shields.io/github/license/otwld/ollama-helm)
[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/ollama-helm)](https://artifacthub.io/packages/helm/ollama-helm/ollama)
[![Helm Lint and Test](https://github.com/otwld/ollama-helm/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/otwld/ollama-helm/actions/workflows/ci.yaml)
[![Discord](https://img.shields.io/badge/Discord-OTWLD-blue?logo=discord&logoColor=white)](https://discord.gg/U24mpqTynB)

[Ollama](https://ollama.ai/), get up and running with large language models, locally.

This Community Chart is for deploying [Ollama](https://github.com/ollama/ollama).

## Requirements

- Kubernetes: `>= 1.16.0-0` for **CPU only**

- Kubernetes: `>= 1.26.0-0` for **GPU** stable support (NVIDIA and AMD)

*Not all GPUs are currently supported with ollama (especially with AMD)*

## Deploying Ollama chart

To install the `ollama` chart in the `ollama` namespace:

```console
helm repo add ollama-helm https://otwld.github.io/ollama-helm/
helm repo update
helm install ollama ollama-helm/ollama --namespace ollama --create-namespace
```

## Upgrading Ollama chart

First please read the [release notes](https://github.com/ollama/ollama/releases) of Ollama to make sure there are no
backwards incompatible changes.

Make adjustments to your values as needed, then run `helm upgrade`:

```console
# -- This pulls the latest version of the ollama chart from the repo.
helm repo update
helm upgrade ollama ollama-helm/ollama --namespace ollama --values values.yaml
```

## Uninstalling Ollama chart

To uninstall/delete the `ollama` deployment in the `ollama` namespace:

```console
helm delete ollama --namespace ollama
```

Substitute your values if they differ from the examples. See `helm delete --help` for a full reference on `delete`
parameters and flags.

## Interact with Ollama

- **Ollama documentation can be found [HERE](https://github.com/ollama/ollama/tree/main/docs)**
- Interact with RESTful API: [Ollama API](https://github.com/ollama/ollama/blob/main/docs/api.md)
- Interact with official clients libraries: [ollama-js](https://github.com/ollama/ollama-js#custom-client)
and [ollama-python](https://github.com/ollama/ollama-python#custom-client)
- Interact with langchain: [langchain-js](https://github.com/ollama/ollama/blob/main/docs/tutorials/langchainjs.md)
and [langchain-python](https://github.com/ollama/ollama/blob/main/docs/tutorials/langchainpy.md)

## Examples

- **It's highly recommended to run an updated version of Kubernetes for deploying ollama with GPU**

### Basic values.yaml example with GPU and two models pulled at startup

```
ollama:
gpu:
# -- Enable GPU integration
enabled: true

# -- GPU type: 'nvidia' or 'amd'
type: 'nvidia'

# -- Specify the number of GPU to 1
number: 1

# -- List of models to pull at container startup
models:
pull:
- mistral
- llama2
```

---

### Basic values.yaml example with Ingress

```
ollama:
models:
pull:
- llama2

ingress:
enabled: true
hosts:
- host: ollama.domain.lan
paths:
- path: /
pathType: Prefix
```

- *API is now reachable at `ollama.domain.lan`*

---

### Create and run model from template

```
ollama:
models:
create:
- name: llama3.1-ctx32768
template: |
FROM llama3.1
PARAMETER num_ctx 32768
run:
- llama3.1-ctx32768
```

## Upgrading from 0.X.X to 1.X.X

The version 1.X.X introduces the ability to load models in memory at startup, the values have been changed.

Please change `ollama.models` to `ollama.models.pull` to avoid errors before upgrading:

```yaml
ollama:
models:
- mistral
- llama2
```

To:

```yaml
ollama:
models:
pull:
- mistral
- llama2
```

## Helm Values

- See [values.yaml](values.yaml) to see the Chart's default values.

| Key
|--------------------
| affinity
| autoscaling.enabled
| autoscaling.maxReplicas
| autoscaling.minReplicas
| autoscaling.targetC
| deployment.labels
| extraArgs
| extraEnv
| extraEnvFrom
| extraObjects
| fullnameOverride
| hostIPC
| hostNetwork
| hostPID
| image.pullPolicy
| image.repository
| image.tag
| imagePullSecrets
| ingress.annotations
| ingress.className
| ingress.enabled
| ingress.hosts[0].host
| ingress.hosts[0].paths[0].path
| ingress.hosts[0].pa
| ingress.tls
| initContainers
| knative.annotations
| knative.containerConcurrency
| knative.enabled
| knative.idleTimeoutSeconds
| knative.responseSta
| knative.timeoutSeconds
| lifecycle
| livenessProbe.enabled
| livenessProbe.failureThreshold
| livenessProbe.initi
| livenessProbe.path
| livenessProbe.periodSeconds
| livenessProbe.successThreshold
| livenessProbe.timeoutSeconds
| nameOverride
| namespaceOverride
| nodeSelector
| ollama.gpu.enabled
| ollama.gpu.mig.devices
| ollama.gpu.mig.enabled
| ollama.gpu.number
| ollama.gpu.nvidiaResource
| ollama.gpu.type
| ollama.insecure
| ollama.models.create
| ollama.models.pull
| ollama.models.run
| ollama.mountPath
| ollama.port
| persistentVolume.accessModes
| persistentVolume.annotations
| persistentVolume.enabled
| persistentVolume.existingClaim
| persistentVolume.size
| persistentVolume.storageClass
| persistentVolume.subPath
| persistentVolume.volumeMode
| persistentVolume.volumeName
| podAnnotations
| podLabels
| podSecurityContext
| readinessProbe.enabled
| readinessProbe.failureThreshold
| readinessProbe.init
| readinessProbe.path
| readinessProbe.periodSeconds
| readinessProbe.successThreshold
| readinessProbe.timeoutSeconds
| replicaCount
| resources.limits
| resources.requests
| runtimeClassName
| securityContext
| service.annotations
| service.labels
| service.loadBalancerIP
| service.nodePort
| service.port
| service.type
| serviceAccount.annotations
| serviceAccount.automount
| serviceAccount.create
| serviceAccount.name
| tolerations
| topologySpreadConstraints
| updateStrategy.type
| volumeMounts
| volumes | Type | Default | Description | ------------------------|--------|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | object | `{}` | Affinity for pod assignment | | bool | `false` | Enable autoscaling | | int | `100` | Number of maximum replicas | | int | `1` | Number of minimum replicas | PUUtilizationPercentage | int | `80` | CPU usage to target replica | | object | `{}` | Labels to add to the deployment | | list | `[]` | Additional arguments on the output Deployment definition. | | list | `[]` | Additional environments variables on the output Deployment definition. For extra OLLAMA env, please refer to https://github.com/ollama/ollama/blob/main/envconfig/config.go | | list | `[]` | Additionl environment variables from external sources (like ConfigMap) | | list | `[]` | Extra K8s manifests to deploy | | string | `""` | String to fully override template | | bool | `false` | Use the host’s ipc namespace. | | bool | `false` | Use the host's network namespace. | | bool | `false` | Use the host’s pid namespace | | string | `"IfNotPresent"` | Docker pull policy | | string | `"ollama/ollama"` | Docker image registry | | string | `""` | Docker image tag, overrides the image tag whose default is the chart appVersion. | | list | `[]` | Docker registry secret names as an array | | object | `{}` | Additional annotations for the Ingress resource. | | string | `""` | IngressClass that will be used to implement the Ingress (Kubernetes 1.18+) | | bool | `false` | Enable ingress controller resource | | string | `"ollama.local"` | | | string | `"/"` | | ths[0].pathType | string | `"Prefix"` | | | list | `[]` | The tls configuration for hostnames to be covered with this ingress record. | | list | `[]` | Init containers to add to the pod | | object | `{}` | Knative service annotations | | int | `0` | Knative service container concurrency | | bool | `false` | Enable Knative integration | | int | `300` | Knative service idle timeout seconds | rtTimeoutSeconds | int | `300` | Knative service response start timeout seconds | | int | `300` | Knative service timeout seconds | | object | `{}` | Lifecycle for pod assignment (override ollama.models startup pull/run) | | bool | `true` | Enable livenessProbe | | int | `6` | Failure threshold for livenessProbe | alDelaySeconds | int | `60` | Initial delay seconds for livenessProbe | | string | `"/"` | Request path for livenessProbe | | int | `10` | Period seconds for livenessProbe | | int | `1` | Success threshold for livenessProbe | | int | `5` | Timeout seconds for livenessProbe | | string | `""` | String to partially override template (will maintain the release name) | | string | `""` | String to fully override namespace | | object | `{}` | Node labels for pod assignment. | | bool | `false` | Enable GPU integration | | object | `{}` | Specify the mig devices and the corresponding number | | bool | `false` | Enable multiple mig devices If enabled you will have to specify the mig devices If enabled is set to false this section is ignored | | int | `1` | Specify the number of GPU If you use MIG section below then this parameter is ignored | | string | `"nvidia.com/gpu"` | only for nvidia cards; change to (example) 'nvidia.com/mig-1g.10gb' to use MIG slice | | string | `"nvidia"` | GPU type: 'nvidia' or 'amd' If 'ollama.gpu.enabled', default value is nvidia If set to 'amd', this will add 'rocm' suffix to image tag if 'image.tag' is not override This is due cause AMD and CPU/CUDA are different images | | bool | `false` | Add insecure flag for pulling at container startup | | list | `[]` | List of models to create at container startup, there are two options 1. Create a raw model 2. Load a model from configMaps, configMaps must be created before and are loaded as volume in "/models" directory. create: - name: llama3.1-ctx32768 configMapRef: my-configmap configMapKeyRef: configmap-key - name: llama3.1-ctx32768 template: | FROM llama3.1 PARAMETER num_ctx 32768 | | list | `[]` | List of models to pull at container startup The more you add, the longer the container will take to start if models are not present pull: - llama2 - mistral | | list | `[]` | List of models to load in memory at container startup run: - llama2 - mistral | | string | `""` | Override ollama-data volume mount path, default: "/root/.ollama" | | int | `11434` | | | list | `["ReadWriteOnce"]` | Ollama server data Persistent Volume access modes Must match those of existing PV or dynamic provisioner Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/ | | object | `{}` | Ollama server data Persistent Volume annotations | | bool | `false` | Enable persistence using PVC | | string | `""` | If you'd like to bring your own PVC for persisting Ollama state, pass the name of the created + ready PVC here. If set, this Chart will not create the default PVC. Requires server.persistentVolume.enabled: true | | string | `"30Gi"` | Ollama server data Persistent Volume size | | string | `""` | Ollama server data Persistent Volume Storage Class If defined, storageClassName: If set to "-", storageClassName: "", which disables dynamic provisioning If undefined (the default) or set to null, no storageClassName spec is set, choosing the default provisioner. (gp2 on AWS, standard on GKE, AWS & OpenStack) | | string | `""` | Subdirectory of Ollama server data Persistent Volume to mount Useful if the volume's root directory is not empty | | string | `""` | Ollama server data Persistent Volume Binding Mode If defined, volumeMode: If empty (the default) or set to null, no volumeBindingMode spec is set, choosing the default mode. | | string | `""` | Pre-existing PV to attach this claim to Useful if a CSI auto-provisions a PV for you and you want to always reference the PV moving forward | | object | `{}` | Map of annotations to add to the pods | | object | `{}` | Map of labels to add to the pods | | object | `{}` | Pod Security Context | | bool | `true` | Enable readinessProbe | | int | `6` | Failure threshold for readinessProbe | ialDelaySeconds | int | `30` | Initial delay seconds for readinessProbe | | string | `"/"` | Request path for readinessProbe | | int | `5` | Period seconds for readinessProbe | | int | `1` | Success threshold for readinessProbe | | int | `3` | Timeout seconds for readinessProbe | | int | `1` | Number of replicas | | object | `{}` | Pod limit | | object | `{}` | Pod requests | | string | `""` | Specify runtime class | | object | `{}` | Container Security Context | | object | `{}` | Annotations to add to the service | | object | `{}` | Labels to add to the service | | string | `nil` | Load Balancer IP address | | int | `31434` | Service node port when service type is 'NodePort' | | int | `11434` | Service port | | string | `"ClusterIP"` | Service type | | object | `{}` | Annotations to add to the service account | | bool | `true` | Automatically mount a ServiceAccount's API credentials? | | bool | `true` | Specifies whether a service account should be created | | string | `""` | The name of the service account to use. If not set and create is true, a name is generated using the fullname template | | list | `[]` | Tolerations for pod assignment | | object | `{}` | Topology Spread Constraints for pod assignment | | string | `"Recreate"` | Deployment strategy can be "Recreate" or "RollingUpdate". Default is Recreate | | list | `[]` | Additional volumeMounts on the output Deployment definition. | | list | `[]` | Additional volumes on the output Deployment definition. |

----------------------------------------------

## Core team

Jean Baptiste Detroyes

Nathan Tréhout

## Support

- For questions, suggestions, and discussion about Ollama please refer to
the [Ollama issue page](https://github.com/ollama/ollama/issues)
- For questions, suggestions, and discussion about this chart please
visit [Ollama-Helm issue page](https://github.com/otwld/ollama-helm/issues) or join
our [OTWLD Discord](https://discord.gg/U24mpqTynB)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/otwld/ollama-helm

Awesome Lists containing this project

README