https://github.com/idvoretskyi/ollama-k8s
https://github.com/idvoretskyi/ollama-k8s
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/idvoretskyi/ollama-k8s
- Owner: idvoretskyi
- License: mit
- Created: 2025-05-12T16:42:05.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-20T16:06:34.000Z (12 months ago)
- Last Synced: 2025-07-12T06:38:25.590Z (11 months ago)
- Language: Shell
- Size: 65.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Ollama on Kubernetes
This repository contains Kubernetes manifests to deploy Ollama and Open WebUI on a Kubernetes cluster. This setup allows you to run large language models (LLMs) locally in your Kubernetes environment.
🚀 **Quick start:** Run `./scripts/quick-start.sh` to begin
[](https://codespaces.new/idvoretskyi/ollama-k8s)
Try it online instantly with no local setup! [Open in Codespaces](https://codespaces.new/idvoretskyi/ollama-k8s)
## Directory Structure
This project has been organized into a tree-based layout for improved maintainability:
- `k8s/` - Kubernetes manifests (standard and codespaces variants)
- `scripts/` - Shell scripts for deployment and management operations
## What's Included
- **Ollama**: A framework for running LLMs locally
- **Open WebUI**: A web interface for interacting with Ollama models
- **Helper Scripts**: For pulling models and port-forwarding
- **GitHub Codespaces Support**: Run everything in the cloud
- **Quick Start Script**: Simple menu to get started (`./scripts/quick-start.sh`)
## Prerequisites
- A running Kubernetes cluster (tested on local Kubernetes with 24GB RAM)
- kubectl installed and configured
- At least 8GB of RAM available for running models (more for larger models)
- About 1.5GB of RAM for the WebUI and Ollama service components
## Hardware Recommendations
- For small models (Llama3-8B, Mistral-7B, Phi3-mini): At least 8GB RAM
- For medium models (CodeLlama-7B, etc.): At least 16GB RAM
- For larger models (Llama3-70B, etc.): 24GB+ RAM (may require GPU)
## Quick Setup
Run the quick start script to deploy everything in one step:
```bash
./scripts/quick-start.sh
```
## Manual Deployment
1. Create the namespace:
```
kubectl apply -f k8s/standard/namespace.yaml
```
2. Create the persistent volume claim:
```
kubectl apply -f k8s/standard/pvc.yaml
```
3. Deploy Ollama:
```
kubectl apply -f k8s/standard/deployment.yaml
```
4. Create the Ollama service:
```
kubectl apply -f k8s/standard/service.yaml
```
5. Deploy the Web UI:
```
kubectl apply -f k8s/standard/webui-deployment.yaml
```
6. Create the Web UI service:
```
kubectl apply -f k8s/standard/webui-service.yaml
```
## Accessing Ollama
Once deployed, you can access Ollama API at:
```
http://:11434
```
To get the external IP:
```
kubectl get svc ollama -n ollama
```
## Web UI
For a user-friendly interface, we've included the Open WebUI for Ollama.
Once deployed, you can access the UI at:
```
http://:8080
```
To get the WebUI external IP:
```
kubectl get svc ollama-webui -n ollama
```
### Port Forwarding
If you're running on a local Kubernetes cluster and want to access Ollama and WebUI through localhost, use the included port-forwarding script:
```bash
./scripts/port-forward.sh
```
This will make Ollama available at http://localhost:11434 and the WebUI at http://localhost:8080.
Press Ctrl+C to stop port forwarding when you're done.
## Running models
Ollama will be available at http://:11434. You can interact with it using:
1. The Open WebUI (recommended)
2. The Ollama CLI (if port-forwarding)
3. REST API calls
For detailed usage instructions, see the examples below.
Quick example API call:
```
curl -X POST http://:11434/api/generate -d '{
"model": "llama3",
"prompt": "Why is the sky blue?"
}'
```
## Pulling models
### Using the Helper Script
We've included a helper script to make pulling models easier:
```bash
./scripts/pull-model.sh llama3
```
You can replace `llama2` with any model available in the [Ollama Library](https://ollama.ai/library).
### Manual Method
Alternatively, you can pull a model by executing directly into the pod:
```bash
kubectl exec -it -n ollama $(kubectl get pods -n ollama -l app=ollama -o jsonpath='{.items[0].metadata.name}') -- ollama pull llama3
```
## Available Models
Here are some recommended models that work well with 8GB RAM:
- **llama3**: The latest Meta Llama 3 model (8B parameters)
- **mistral**: Mistral 7B model
- **phi3-mini**: Microsoft's Phi-3 mini model (3.8B parameters)
- **phi3**: Microsoft's Phi-3 small model (7B parameters)
- **codellama:7b**: Specialized for code generation (7B parameters)
For more information on available models, visit the [Ollama Library](https://ollama.ai/library).
## Usage Examples
### Chat in Web UI
The simplest way to interact with models is through the WebUI:
1. Access the WebUI at http://localhost:8080 (if port-forwarded) or http://:8080
2. When prompted, configure the API endpoint to be http://ollama:11434 (or http://localhost:11434 if accessing from outside the cluster)
3. Select a model (you may need to pull it first using the `./scripts/pull-model.sh` script)
4. Start chatting!
### API Usage
You can interact with the Ollama API directly:
```bash
# Generate a response
curl -X POST http://:11434/api/generate -d '{
"model": "llama3",
"prompt": "Write a haiku about Kubernetes"
}'
# List available models
curl http://:11434/api/tags
```
## GitHub Codespaces
This repository works great with GitHub Codespaces! Here's how to get started:
1. Click the Codespaces button above or [launch directly](https://codespaces.new/idvoretskyi/ollama-k8s)
2. Run the setup script: `./scripts/codespaces-setup.sh` (if needed)
3. Deploy with optimized settings: `./scripts/codespaces-start.sh`
4. Pull a small model: `./scripts/pull-model.sh phi3-mini`
5. Access the WebUI through the ports tab
Recommended models for Codespaces (8GB RAM limit):
- `phi3-mini` (3.8B parameters) - Compact but powerful
- `llama3` (8B parameters) - Meta's latest model
- `mistral:7b` (7B parameters) - Excellent performance
- `codellama:7b` (7B parameters) - Code generation
## Troubleshooting
### Pod won't start
- Check resources: `kubectl describe pod -n ollama`
- The Ollama image is large and may take time to download initially
### Model runs out of memory
- Use a smaller model (phi3-mini instead of llama3)
- Increase memory limits in deployment.yaml
- Consider enabling swap if supported
### WebUI can't connect to Ollama
- Check both pods are running: `kubectl get pods -n ollama`
- Verify OLLAMA_API_BASE_URL is set to "http://ollama:11434"
- Test service discovery: `kubectl exec -it -n ollama -- curl ollama:11434/api/version`
### WebUI crashes with OOMKilled
- Increase memory in webui-deployment.yaml:
```yaml
resources:
requests:
memory: "512Mi"
limits:
memory: "1Gi"
```
- Apply changes: `kubectl apply -f k8s/standard/webui-deployment.yaml`
## Contributing
Contributions are welcome! Here's how you can contribute to this project:
1. Fork the repository
2. Create a new branch (`git checkout -b feature/your-feature-name`)
3. Make your changes
4. Commit your changes (`git commit -m 'Add some feature'`)
5. Push to the branch (`git push origin feature/your-feature-name`)
6. Open a Pull Request
Please ensure your code follows the existing style and includes appropriate documentation.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.