An open API service indexing awesome lists of open source software.

https://github.com/mhajder/openwebui-stack

Docker Compose stack for Open WebUI with LiteLLM proxy, observability, and security best practices.
https://github.com/mhajder/openwebui-stack

ai docker docker-compose litellm open-webui openwebui otel qdrant traefik

Last synced: 4 months ago
JSON representation

Docker Compose stack for Open WebUI with LiteLLM proxy, observability, and security best practices.

Awesome Lists containing this project

README

          

# Open WebUI Stack

Docker Compose stack for [Open WebUI](https://github.com/open-webui/open-webui) with LiteLLM proxy, observability, and security best practices.

## Architecture

```mermaid
graph TB
Internet[Internet]
Traefik[Traefik
Reverse Proxy]
OpenWebUI[Open WebUI]
LiteLLM[LiteLLM
Proxy]
LGTM[LGTM Stack
Grafana/Loki/Tempo/Mimir]
Qdrant[Qdrant
Vector DB]
Valkey[Valkey
Shared Cache]
PostgreSQL[PostgreSQL
Shared DB]
OTEL[OpenTelemetry
Collector]
Exporters[Exporters
Node/PostgreSQL/GPU]

Internet -->|HTTPS| Traefik
Traefik --> OpenWebUI
Traefik --> LiteLLM
Traefik --> LGTM

OpenWebUI --> Qdrant
OpenWebUI --> Valkey
OpenWebUI --> LiteLLM

LiteLLM --> Valkey
LiteLLM --> PostgreSQL

Exporters --> PostgreSQL
Exporters --> OTEL
OTEL --> LGTM
```

## Features

- **Traefik Reverse Proxy**: Automatic HTTPS with self-signed certificates generated by Traefik, HTTP to HTTPS redirect
- **LiteLLM Proxy**: Unified gateway for multiple LLM providers (OpenAI, Anthropic, Gemini, Ollama)
- **LGTM Observability Stack**: Grafana, Loki, Tempo, Mimir for logs, traces, and metrics
- **Qdrant Vector Database**: Production-ready vector search for RAG
- **PostgreSQL**: Shared database for Open WebUI and LiteLLM
- **OpenTelemetry**: Full observability with distributed tracing
- **Security**: Network isolation, secure headers, rate limiting

## Quick Start

### Prerequisites

- Docker Engine 24.0+
- Docker Compose v2.20+
- 8GB+ RAM recommended
- (Optional) NVIDIA GPU with drivers for GPU metrics

### Installation

1. **Clone the repository**
```bash
git clone https://github.com/mhajder/openwebui-stack.git
cd openwebui-stack
```

2. **Run the setup script**
```bash
chmod +x scripts/*.sh
./scripts/setup.sh
```

In `docker-compose.yml` change `ENABLE_SIGNUP` to `true` if you want to register the first admin user via Open WebUI. You can disable it again after the first user is created.

3. **Start the stack**
```bash
docker compose --profile monitoring --profile gpu up -d
```

4. **Access the services**
- Open WebUI: https://localhost
- Grafana: https://grafana.localhost
- LiteLLM: https://litellm.localhost

### Local Domain Resolution

Add these entries to your `/etc/hosts`:
```
127.0.0.1 localhost grafana.localhost litellm.localhost traefik.localhost
```

## Configuration

### Environment Variables

Key environment variables in `.env`:

| Variable | Description | Example |
|----------|-------------|---------|
| `COMPOSE_PROJECT_NAME` | Docker Compose project name | `openwebui-stack` |
| `DOMAIN` | Base domain for services | `localhost` or `example.com` |
| `POSTGRES_USER` | PostgreSQL admin user | `postgres` |
| `POSTGRES_PASSWORD` | PostgreSQL admin password | Generated by setup script |
| `OPENWEBUI_DB_NAME` | Open WebUI database name | `openwebui` |
| `OPENWEBUI_DB_USER` | Open WebUI database user | `openwebui` |
| `OPENWEBUI_DB_PASSWORD` | Open WebUI database password | Generated by setup script |
| `LITELLM_DB_NAME` | LiteLLM database name | `litellm` |
| `LITELLM_DB_USER` | LiteLLM database user | `litellm` |
| `LITELLM_DB_PASSWORD` | LiteLLM database password | Generated by setup script |
| `OPENWEBUI_SECRET_KEY` | Secret key for Open WebUI sessions | Generated by setup script |
| `LITELLM_MASTER_KEY` | Master API key for LiteLLM | Generated by setup script |
| `LITELLM_SALT_KEY` | Salt key for LiteLLM encryption | Generated by setup script |
| `TRAEFIK_DASHBOARD_USER` | Traefik dashboard username | `admin` |
| `TRAEFIK_DASHBOARD_PASSWORD` | Traefik dashboard password (plain) | Generated by setup script |
| `TRAEFIK_DASHBOARD_PASSWORD_HASH` | Traefik dashboard password (apr1 hash) | Auto-generated by setup script |
| `GF_SECURITY_ADMIN_PASSWORD` | Grafana admin password | Generated by setup script |

### Services Overview

#### Core Services

- **Traefik**: Reverse proxy with automatic HTTPS, routing, and load balancing
- **Open WebUI**: Web UI for interacting with LLMs
- **LiteLLM**: Unified gateway for multiple LLM providers
- **PostgreSQL**: Shared relational database for Open WebUI and LiteLLM
- **Valkey**: High-performance Redis alternative for caching and sessions

#### Storage & Search

- **Qdrant**: Vector database for semantic search and RAG operations

#### Observability (LGTM Stack)

- **LGTM** (Grafana OTel): Integrated stack providing:
- **Prometheus/Mimir**: Metrics collection and storage
- **Loki**: Centralized log aggregation
- **Tempo**: Distributed tracing
- **Grafana**: Visualization and dashboards

#### Exporters & Collectors

- **Node Exporter**: System-level metrics (CPU, memory, disk, network)
- **OpenTelemetry Collector**: Collects and forwards metrics, logs, and traces
- **PostgreSQL Exporter** (optional profile): Database metrics
- **NVIDIA GPU Exporter** (optional profile): GPU metrics (requires GPU)

### Production Deployment

For production environments:

1. **Use proper certificates**
- Replace self-signed certs with Let's Encrypt or CA-signed certificates
- Update [traefik/traefik.yml](traefik/traefik.yml) for ACME configuration (Let's Encrypt)
- Example ACME configuration:
```yaml
certificatesResolvers:
letsencrypt:
acme:
email: admin@example.com
storage: /data/acme.json
httpChallenge:
entryPoint: web
```

2. **Update security settings**
- Change default passwords in `.env` (setup script generates strong ones automatically)
- Enable authentication on all service dashboards
- Configure rate limiting and WAF rules in Traefik

3. **Enable GPU support** (if available)
```bash
docker compose --profile gpu up -d
```
This activates the NVIDIA GPU Exporter for monitoring GPU metrics.

4. **Configure persistent backups**
- Use external volume drivers for data persistence
- Set up automated backup scripts using cron
- Store backups in secure, off-site locations

5. **Network security**
- Use VPN or IP whitelisting for external access
- Configure firewall rules (UFW, iptables)
- Use private networks when possible

6. **Database optimization**
- Configure PostgreSQL replication for HA
- Set up automated backup and recovery procedures
- Monitor database performance and storage

7. **Monitoring and alerting**
- Configure alert rules in Grafana for critical metrics
- Set up notification channels (email, Slack, PagerDuty)
- Implement log retention policies in Loki

## Monitoring

### Grafana Dashboards

Access Grafana at **https://grafana.localhost** with default credentials:
- **Username**: `admin`
- **Password**: Set in `.env` as `GF_SECURITY_ADMIN_PASSWORD` (generated by setup script)

#### Pre-configured Dashboards

The stack includes several pre-configured dashboards:

1. **litellm-dashboard.json** - LiteLLM proxy metrics and performance
- Request rates and latencies
- Token usage and costs
- Error rates by model
- Model provider health

2. **node-exporter-dashboard.json** - System metrics
- CPU, memory, and disk usage
- Network I/O
- Process metrics
- System uptime

3. **openwebui-dashboard.json** - Open WebUI application metrics
- Request rates and response times
- User activity
- Message counts
- API endpoint performance

4. **postgresql-dashboard.json** - Database metrics
- Query performance
- Connection pool status
- Cache hit rates
- Replication lag (if configured)

5. **traefik-dashboard.json** - Reverse proxy metrics
- Request rates by service
- Response time percentiles
- HTTP status codes
- SSL/TLS certificate expiry

6. **opentelemetry-dashboard.json** - System-wide observability
- Distributed traces
- Span analysis
- Service dependencies
- Error tracking

7. **nvidia-dcgm-dashboard.json** - GPU metrics (if GPU profile enabled)
- GPU memory usage
- Compute utilization
- Temperature monitoring
- Power consumption

### Available Metrics

**Prometheus/Mimir** stores these metric types:

| Category | Metrics |
|----------|---------|
| **System** | CPU load, memory usage, disk I/O, network traffic |
| **Database** | Connection count, query latency, transaction rate |
| **HTTP** | Request rate, response time, status codes, error rate |
| **Cache** | Hit/miss rates, eviction rates, memory usage |
| **Application** | Custom metrics from services |
| **GPU** | Memory usage, compute, temperature, power (if enabled) |

### Traces in Tempo

OpenTelemetry traces are automatically collected and stored in Tempo:

- Access via Grafana → Explore → Tempo
- View distributed traces across all services
- Analyze service dependencies
- Identify performance bottlenecks
- Track errors through the system

## Additional Resources

- [Open WebUI Documentation](https://docs.openwebui.com/)
- [LiteLLM Proxy Docs](https://docs.litellm.ai/docs/proxy/configs)
- [Traefik Documentation](https://traefik.io/traefik/)
- [Grafana LGTM Stack](https://grafana.com/products/lgtm-stack/)
- [Qdrant Documentation](https://qdrant.tech/documentation/)
- [OpenTelemetry](https://opentelemetry.io/)

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.