https://github.com/kholcomb/gateway_eks
[PoC] EKS deployment of AI-Governance stack
https://github.com/kholcomb/gateway_eks
ai aws bedrock ec2 eks grafana litellm openwebui prometheus rds redis secretsmanager terraform
Last synced: 3 months ago
JSON representation
[PoC] EKS deployment of AI-Governance stack
- Host: GitHub
- URL: https://github.com/kholcomb/gateway_eks
- Owner: kholcomb
- Created: 2025-12-05T19:07:11.000Z (7 months ago)
- Default Branch: dev
- Last Pushed: 2026-03-26T01:12:03.000Z (3 months ago)
- Last Synced: 2026-03-26T23:19:12.054Z (3 months ago)
- Topics: ai, aws, bedrock, ec2, eks, grafana, litellm, openwebui, prometheus, rds, redis, secretsmanager, terraform
- Language: HCL
- Homepage:
- Size: 316 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Codeowners: .github/CODEOWNERS
- Security: security/ARCHITECTURE.md
Awesome Lists containing this project
README
# LiteLLM + OpenWebUI EKS Deployment
**Production-ready AI/LLM infrastructure on Amazon EKS** with comprehensive observability, security, and authentication.
## Quick Links
| Guide | Description |
|-------|-------------|
| 🚀 [Quick Start](#quick-start) | Get started in 20-35 minutes |
| 📖 [Deployment Guide](docs/DEPLOYMENT_GUIDE.md) | Complete step-by-step walkthrough |
| 🔐 [JWT Setup](docs/JWT_AUTHENTICATION_SETUP.md) | Configure Okta OIDC authentication |
| 📦 [ECR Setup](docs/ECR_SETUP.md) | Container registry configuration |
| 🤖 [MCP Deployment](docs/MCP_DEPLOYMENT.md) | Deploy Model Context Protocol servers |
| 🏗️ [MCP Operator](docs/MCP_OPERATOR_ARCHITECTURE.md) | Kubernetes operator for MCP servers |
---
## Architecture
```mermaid
graph TB
subgraph "Users"
User[Users/Clients]
end
subgraph "External Services"
Bedrock[AWS Bedrock
Claude, Llama, Mistral]
RDS[Amazon RDS
PostgreSQL]
SecretsManager[AWS Secrets Manager]
Okta[Okta OIDC]
end
subgraph "EKS Cluster"
subgraph "Application"
OpenWebUI[OpenWebUI
Chat Frontend]
LiteLLM[LiteLLM Proxy
JWT Auth + Routing]
Redis[Redis HA
Caching]
end
subgraph "Observability"
Prometheus[Prometheus]
Grafana[Grafana]
Jaeger[Jaeger]
end
subgraph "Security"
ESO[External Secrets]
OPA[OPA Gatekeeper]
end
end
User -->|HTTPS| OpenWebUI
OpenWebUI -->|API + JWT| LiteLLM
LiteLLM -->|Model Requests| Bedrock
LiteLLM -->|Cache| Redis
OpenWebUI -->|Session Data| RDS
LiteLLM -->|Metrics| Prometheus
Prometheus -->|Visualize| Grafana
ESO -->|Sync Secrets| SecretsManager
style LiteLLM fill:#326CE5,color:#fff
style OpenWebUI fill:#61DAFB
```
## Components
| Component | Purpose |
|-----------|---------|
| **LiteLLM** | API gateway to AWS Bedrock models |
| **OpenWebUI** | Chat frontend with Okta authentication |
| **Redis** | Caching |
| **Prometheus/Grafana** | Metrics collection & visualization |
| **Jaeger** | Distributed tracing |
| **External Secrets** | AWS Secrets Manager integration |
| **OPA Gatekeeper** |Policy enforcement |
---
## Prerequisites
### AWS Account Setup
- AWS account with appropriate permissions
- AWS CLI v2 configured (`aws configure`)
- EKS cluster permissions
### Local Tools
```bash
# macOS
brew install awscli kubectl helm
# Verify installations
aws --version # AWS CLI 2.x
kubectl version # v1.28+
helm version # v3.0+
```
---
## Quick Start
### 1. Deploy Infrastructure
```bash
# Set environment variables
export AWS_REGION=us-east-1
export EKS_CLUSTER_NAME=litellm-eks
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
# Choose deployment method
cd scripts
./deploy.sh infrastructure
# You'll be prompted to choose: [T]erraform or [E]ksctl
```
### 2. Configure kubectl
```bash
aws eks update-kubeconfig --name $EKS_CLUSTER_NAME --region $AWS_REGION
kubectl cluster-info # Verify connection
```
### 3. Create Required Secrets
Create Okta secrets in AWS Secrets Manager ([detailed guide](docs/JWT_AUTHENTICATION_SETUP.md)):
```bash
# LiteLLM: JWT public key URL
aws secretsmanager create-secret \
--name litellm/jwt-public-key-url \
--secret-string "https:///oauth2/default/v1/keys" \
--region $AWS_REGION
# OpenWebUI: Session encryption, Okta client ID/secret, admin email
# See JWT_AUTHENTICATION_SETUP.md for complete secret creation steps
```
**Note:** Database URL secret should already exist from Terraform/eksctl setup.
### 4. Deploy Applications
```bash
cd scripts
./deploy.sh all
```
This deploys:
- ✅ External Secrets Operator
- ✅ OPA Gatekeeper + policies
- ✅ Prometheus/Grafana monitoring
- ✅ Jaeger distributed tracing
- ✅ Redis cluster
- ✅ LiteLLM proxy with JWT authentication
- ✅ OpenWebUI with Okta OIDC
### 5. Verify Deployment
```bash
kubectl get pods -A | grep -E 'litellm|open-webui|monitoring|redis'
kubectl get externalsecret -A # Verify secrets synced
```
### 6. Access Applications
**Option A: From bastion host**
```bash
./scripts/setup-bastion.sh create
./scripts/setup-bastion.sh connect
# Inside bastion:
llm-ui # OpenWebUI → http://localhost:8080
llm-grafana # Grafana → http://localhost:3000
```
**Option B: Port-forward from local machine**
```bash
# OpenWebUI
kubectl port-forward -n open-webui svc/open-webui 8080:80
# Grafana (default: admin / prom-operator)
kubectl port-forward -n monitoring svc/kube-prometheus-grafana 3000:80
```
---
## Deployment Options
The `deploy.sh` script supports granular deployment:
```bash
# Full deployment
./deploy.sh all
# Infrastructure only
./deploy.sh terraform # or: ./deploy.sh eksctl
# Individual components
./deploy.sh irsa # Create IAM roles
./deploy.sh secrets # Create AWS secrets
./deploy.sh external-secrets # Deploy External Secrets Operator
./deploy.sh redis # Deploy Redis HA
./deploy.sh litellm # Deploy LiteLLM
./deploy.sh openwebui # Deploy OpenWebUI
./deploy.sh monitoring # Deploy Prometheus/Grafana
./deploy.sh jaeger # Deploy Jaeger
./deploy.sh gatekeeper # Deploy OPA Gatekeeper
./deploy.sh verify # Verify deployment
# Complete teardown
./deploy.sh infrastructure-destroy
```
**Deployment Modes:**
- **Interactive** (default): Prompts before updating existing resources
- **Non-interactive**: `INTERACTIVE_MODE=false ./deploy.sh all`
📖 **See [Deployment Guide](docs/DEPLOYMENT_GUIDE.md)** for detailed deployment workflows.
---
## Configuration
### LiteLLM Models
**Customize models:** Edit `helm-values/litellm-values.yaml`
### Required AWS Secrets
| Secret Name | Description | Created By |
|-------------|-------------|------------|
| `litellm/database-url` | PostgreSQL connection string | Manual |
| `litellm/jwt-public-key-url` | Okta JWKS endpoint | Manual |
| `litellm/master-key` | LiteLLM admin key | deploy.sh |
| `litellm/salt-key` | DB encryption salt (immutable) | deploy.sh |
| `litellm/redis-password` | Redis password | deploy.sh |
| `openwebui/webui-secret-key` | Session encryption | Manual |
| `openwebui/okta-openid-url` | Okta OpenID discovery URL | Manual |
| `openwebui/okta-client-id` | Okta app client ID | Manual |
| `openwebui/okta-client-secret` | Okta app client secret | Manual |
| `openwebui/admin-email` | Admin user emails | Manual |
📖 **See [JWT Authentication Setup](docs/JWT_AUTHENTICATION_SETUP.md)** for detailed secret creation.
---
## Monitoring & Observability
### Grafana Dashboards
Access: `kubectl port-forward -n monitoring svc/kube-prometheus-grafana 3000:80`
### Prometheus Metrics
Access: `kubectl port-forward -n monitoring svc/kube-prometheus-kube-prome-prometheus 9090:9090`
### Jaeger Tracing
Access: `kubectl port-forward -n monitoring svc/jaeger-query 16686:16686`
---
## Advanced Features
### Model Context Protocol (MCP) Servers
📖 **See [MCP Deployment Guide](docs/MCP_DEPLOYMENT.md)** for deployment patterns and examples.
📖 **See [MCP Operator Architecture](docs/MCP_OPERATOR_ARCHITECTURE.md)** for Kubernetes operator design.
### Container Registry (ECR)
📖 **See [ECR Setup Guide](docs/ECR_SETUP.md)** for detailed configuration.
### OPA Gatekeeper Policies
Security policies automatically enforced:
- ✅ Approved container registries only
- ✅ No `:latest` image tags
- ✅ Container resource limits required
- ✅ Non-root containers only
- ✅ Required labels and probes
[View policies](/manifests/opa-policies/)
---
## Troubleshooting
### Quick Diagnostics
```bash
# Check all pods
kubectl get pods -A | grep -E 'litellm|open-webui|monitoring|redis'
# Check External Secrets sync
kubectl get externalsecret -A
kubectl describe externalsecret litellm-secrets -n litellm
# Check LiteLLM logs
kubectl logs -n litellm -l app.kubernetes.io/name=litellm --tail=100
# Check OpenWebUI logs
kubectl logs -n open-webui -l app.kubernetes.io/name=open-webui --tail=100
```
📖 **See [Deployment Guide](docs/DEPLOYMENT_GUIDE.md#troubleshooting)** for comprehensive troubleshooting.
---
## Cleanup
```bash
# Delete bastion host
./scripts/setup-bastion.sh cleanup
# Delete applications
helm uninstall open-webui -n open-webui
helm uninstall litellm -n litellm
helm uninstall redis -n litellm
helm uninstall jaeger -n monitoring
helm uninstall kube-prometheus -n monitoring
helm uninstall external-secrets -n external-secrets
# Delete infrastructure
./scripts/deploy.sh infrastructure-destroy
```
---
## Contributing
### Infrastructure
- **[terraform/README.md](terraform/README.md)** - Terraform deployment guide
See [CONTRIBUTING.md](CONTRIBUTING.md) for git workflow and contribution guidelines.
## Additional Resources
### External Documentation
- [LiteLLM Documentation](https://docs.litellm.ai/)
- [OpenWebUI Documentation](https://docs.openwebui.com/)
- [AWS Bedrock Models](https://aws.amazon.com/bedrock/claude/)
- [OPA Gatekeeper](https://open-policy-agent.github.io/gatekeeper/)
- [External Secrets Operator](https://external-secrets.io/)
### Related Guides
- [Script Usage](scripts/README.md)
- [Security Architecture](security/ARCHITECTURE.md)
- [OPA Policies](manifests/opa-policies/README.md)