https://github.com/vineethvijay/prox-k8s-lab
A Kubernetes homelab on Proxmox VMs — provisioned with Ansible, managed with ArgoCD and Helm
https://github.com/vineethvijay/prox-k8s-lab
ansible argocd automation devops docker gitops gpu-passthrough helm high-availability home-server homelab infrastructure-as-code jellyfin kubeadm kubernetes linux media-server plex proxmox self-hosted
Last synced: 19 days ago
JSON representation
A Kubernetes homelab on Proxmox VMs — provisioned with Ansible, managed with ArgoCD and Helm
- Host: GitHub
- URL: https://github.com/vineethvijay/prox-k8s-lab
- Owner: vineethvijay
- Created: 2026-04-06T09:44:17.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-07T07:44:00.000Z (2 months ago)
- Last Synced: 2026-04-07T09:28:54.678Z (2 months ago)
- Topics: ansible, argocd, automation, devops, docker, gitops, gpu-passthrough, helm, high-availability, home-server, homelab, infrastructure-as-code, jellyfin, kubeadm, kubernetes, linux, media-server, plex, proxmox, self-hosted
- Language: Shell
- Size: 3.65 MB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Proxmox Kubernetes Homelab
A complete, code-driven homelab that takes two bare-metal Proxmox hosts from empty to a fully operational HA Kubernetes cluster running self-hosted services — media streaming, automation, DNS, backups, monitoring, and more.
Ansible provisions the VMs and bootstraps the cluster, kubeadm sets up a highly available control plane, and ArgoCD takes over from there — continuously deploying and self-healing every application via GitOps using local Helm charts. Push to `main`, everything syncs. This repo is the single source of truth.
> **Disclaimer:** This project is for **homelab learning and educational purposes** only. I do not support or encourage piracy. The media automation tools included here are meant to be used with legally obtained content.
## How It All Works
This project automates a complete Kubernetes homelab from bare metal to running services. The pipeline flows through four stages: **Ansible provisions VMs on Proxmox**, **kubeadm bootstraps an HA K8s cluster**, **ArgoCD takes over for continuous GitOps delivery**, and **a bunch of self-healing applications** run media automation, streaming, DNS, backups, and more — all driven from this single repository.
### End-to-End Architecture

> **[View Interactive Version](https://vineethvijay.github.io/prox-k8s-lab/docs/architecture-animation-v2.html)** — best viewed in a desktop browser
#### Diagram
```mermaid
flowchart TB
GH["fa:fa-code-branch GitHub Repository
vineethvijay/prox-k8s-lab"]:::git
GH -->|"ansible/
Infrastructure as Code"| ANSIBLE["fa:fa-cogs Ansible
13-Phase Pipeline"]:::ansible
GH -.->|"argocd/ + helm/
GitOps auto-sync on main"| ARGO
ANSIBLE -->|"Phase 1: cloud-init
VM creation"| PROXMOX
subgraph PROXMOX["PROXMOX VE HYPERVISORS"]
subgraph PVE1["pve-local · 192.168.1.11
Intel i7-9750H · 16GB"]
CP1["k8s-cp · .200
Control Plane · 2c/3GB"]:::cp
W1["k8s-w1 · .201
Worker · 3c/4GB"]:::worker
W2["k8s-w2 · .202
Worker · 4c/6GB
iGPU + GTX 1650"]:::gpu
end
subgraph PVE2["pve-remote · 192.168.1.8
Ryzen 5 5500U · 16GB"]
CP2["k8s-cp2 · .205
Control Plane · 2c/3GB"]:::cp
CP3["k8s-cp3 · .206
Control Plane · 2c/3GB"]:::cp
W4["k8s-w4 · .204
Worker · 4c/6GB"]:::worker
end
end
ANSIBLE -->|"Phases 2–7
kubeadm + HA setup"| K8S
subgraph K8S["KUBERNETES v1.32 — 6-NODE HA CLUSTER"]
VIP["kube-vip
VIP 192.168.1.199"]:::net
CNI["Calico CNI
10.244.0.0/16"]:::net
LB["MetalLB L2
Pool .240–.250"]:::net
ING["NGINX Ingress
*.homelab.local"]:::net
ARGO["ArgoCD
App-of-Apps"]:::argo
end
ARGO -->|"auto-sync · self-heal
prune"| APPS
subgraph APPS["29 ARGOCD-MANAGED APPLICATIONS"]
subgraph M["Media & Streaming"]
PLEX["Plex · .241
GPU Transcode"]:::media
JELLY["Jellyfin
GPU Transcode"]:::media
ARR["Sonarr · Radarr · Lidarr
Readarr · Bazarr
Prowlarr · Seerr"]:::media
STATS["Tautulli · Jellystat
Pinchflat · Random Streamer ✦"]:::media
end
subgraph D["Downloads"]
SAB["SABnzbd"]:::download
QB["Downloader
+ Gluetun VPN"]:::download
FL["FlareSolverr"]:::download
end
subgraph P["Platform & Infrastructure"]
PI["Pi-hole DNS · .242"]:::infra
VW["Vaultwarden"]:::infra
DASH["Homepage · Headlamp
Gatus · Glances"]:::infra
BK["Kopia Backup · Kopia UI
Filebrowser"]:::infra
NFP["NFS Provisioners
MetalLB Config
Docker Registry · .245"]:::infra
end
end
subgraph STORAGE["EXTERNAL STORAGE"]
NAS["Synology NAS · .28
16TB RAID"]:::storage
HDD["Proxmox Local HDD"]:::storage
end
STORAGE -.->|"NFS mounts
on all workers"| K8S
classDef git fill:#6e40c9,stroke:#6e40c9,color:#fff
classDef ansible fill:#ee0000,stroke:#cc0000,color:#fff
classDef cp fill:#326ce5,stroke:#2457b5,color:#fff
classDef worker fill:#4a90d9,stroke:#3a7bc8,color:#fff
classDef gpu fill:#76b900,stroke:#5a8f00,color:#fff
classDef net fill:#f5a623,stroke:#d4891a,color:#fff
classDef argo fill:#ef7b4d,stroke:#d4642e,color:#fff
classDef media fill:#e040fb,stroke:#c020d9,color:#fff
classDef download fill:#00bcd4,stroke:#0097a7,color:#fff
classDef infra fill:#607d8b,stroke:#455a64,color:#fff
classDef storage fill:#8bc34a,stroke:#689f38,color:#fff
```
> **Key takeaway:** This repo is the single source of truth. Ansible handles one-time infrastructure provisioning (VMs, cluster, networking). ArgoCD handles ongoing application delivery — push to `main` and everything auto-deploys.
>
> ✦ = My own development
### Provisioning Pipeline
Ansible executes 13 playbooks sequentially to go from bare metal to a fully operational cluster:
```mermaid
flowchart LR
P1["01 Create VMs
cloud-init on Proxmox"]:::infra
P2["02 Prepare Nodes
containerd · kubeadm
kubelet"]:::infra
P3["03 Init Cluster
kubeadm init
Calico CNI
Join workers"]:::k8s
P4["04 Monitoring
Prometheus
Grafana"]:::k8s
P5["05 Ingress
MetalLB
NGINX"]:::k8s
P6["06 Remote Workers
2nd Proxmox host"]:::ha
P7["07 HA Conversion
kube-vip VIP
3 control planes"]:::ha
P8["08 Longhorn Deps
open-iscsi
nfs-common"]:::storage
P9["09 NFS Mounts
NAS media
HDD · Backups"]:::storage
P10["10 DNS Hosts
Mac /etc/hosts"]:::storage
P11["11 ArgoCD
Helm +
App-of-Apps"]:::gitops
P12["12 Glances
Host monitoring"]:::gitops
P13["13 DNS Config
Pi-hole primary
Google fallback"]:::gitops
DONE["Cluster
Operational"]:::done
P1 --> P2 --> P3 --> P4 --> P5 --> P6 --> P7 --> P8 --> P9 --> P10 --> P11 --> P12 --> P13 --> DONE
classDef infra fill:#ee0000,stroke:#cc0000,color:#fff
classDef k8s fill:#326ce5,stroke:#2457b5,color:#fff
classDef ha fill:#f5a623,stroke:#d4891a,color:#fff
classDef storage fill:#8bc34a,stroke:#689f38,color:#fff
classDef gitops fill:#ef7b4d,stroke:#d4642e,color:#fff
classDef done fill:#4caf50,stroke:#388e3c,color:#fff
```
> **Legend:** Red = Infrastructure · Blue = K8s Bootstrap · Orange = HA & Scale · Green = Storage & DNS · Coral = GitOps Handoff
**Milestones:** After Phase 3 you have a working single-CP cluster. Phase 7 upgrades it to HA with kube-vip and 3 control planes. Phase 11 installs ArgoCD and hands off application management — from here, all app changes are GitOps-driven.
### GitOps Application Delivery
ArgoCD uses the **App-of-Apps pattern** — one root application auto-discovers and deploys all others:
```mermaid
flowchart TB
DEV["fa:fa-user Developer
git push to main"]:::git
GH["fa:fa-code-branch GitHub
main branch"]:::git
ARGO["fa:fa-sync ArgoCD
Detects drift"]:::argo
AOA["App-of-Apps
argocd/applications/*.yaml"]:::argo
DEV --> GH --> ARGO --> AOA
AOA -->|"13 apps"| MEDIA["Media & Streaming
Plex · Jellyfin · Sonarr · Radarr
Lidarr · Readarr · Bazarr · Prowlarr
Seerr · Tautulli · Jellystat
Pinchflat · Random Streamer ✦"]:::media
AOA -->|"3 apps"| DL["Downloads
SABnzbd · Downloader+VPN
FlareSolverr"]:::download
AOA -->|"4 apps"| DASH["Dashboards & Monitoring
Homepage · Headlamp
Gatus · Glances"]:::dash
AOA -->|"9 apps"| INFRA["Infrastructure & Security
Pi-hole · Vaultwarden · Kopia Backup
Kopia UI · Filebrowser · NFS Provisioner
NFS HDD Provisioner · MetalLB Config
Docker Registry"]:::infra
MEDIA --> HELM["helm/charts/*
Local Helm charts"]:::helm
DL --> HELM
DASH --> HELM
INFRA --> HELM
HELM --> DEPLOY["Deployed to Cluster
auto-sync · self-heal · prune"]:::done
classDef git fill:#6e40c9,stroke:#6e40c9,color:#fff
classDef argo fill:#ef7b4d,stroke:#d4642e,color:#fff
classDef media fill:#e040fb,stroke:#c020d9,color:#fff
classDef download fill:#00bcd4,stroke:#0097a7,color:#fff
classDef dash fill:#f5a623,stroke:#d4891a,color:#fff
classDef infra fill:#607d8b,stroke:#455a64,color:#fff
classDef helm fill:#0f1689,stroke:#0a1060,color:#fff
classDef done fill:#4caf50,stroke:#388e3c,color:#fff
```
> Every application manifest in `argocd/applications/` points to a local Helm chart in `helm/charts/`. ArgoCD renders the chart and applies it to the cluster. If someone manually changes a resource, ArgoCD **self-heals** it back to the Git-defined state.
### Network & Traffic Flow
All services are exposed through a MetalLB + NGINX Ingress stack, with Pi-hole handling local DNS:
```mermaid
flowchart LR
subgraph CLIENT["Client"]
USER["fa:fa-globe Browser / App"]:::client
end
subgraph DNS["DNS Resolution"]
PIHOLE["Pi-hole
192.168.1.242"]:::dns
end
subgraph METALLB["MetalLB L2 ARP"]
VIP240["Ingress VIP
.240"]:::lb
VIP241["Plex VIP
.241"]:::lb
VIP242["Pi-hole VIP
.242"]:::lb
VIP245["Registry VIP
.245"]:::lb
end
subgraph K8S["Kubernetes Cluster"]
NGX["NGINX Ingress
Host-based routing"]:::ingress
SVC["ClusterIP Services"]:::svc
POD["Application Pods"]:::pod
NGX --> SVC --> POD
end
USER -->|"*.homelab.local"| PIHOLE
PIHOLE -->|"resolves to .240"| VIP240
VIP240 --> NGX
USER -.->|"Direct IP .241 / .242 / .245"| VIP241
VIP241 -.->|"externalTrafficPolicy:
Local"| POD
classDef client fill:#6e40c9,stroke:#6e40c9,color:#fff
classDef dns fill:#4caf50,stroke:#388e3c,color:#fff
classDef lb fill:#f5a623,stroke:#d4891a,color:#fff
classDef ingress fill:#326ce5,stroke:#2457b5,color:#fff
classDef svc fill:#607d8b,stroke:#455a64,color:#fff
classDef pod fill:#e040fb,stroke:#c020d9,color:#fff
```
> **Standard path:** Client queries Pi-hole → resolves `*.homelab.local` to `192.168.1.240` → MetalLB advertises via L2 ARP → NGINX Ingress routes by Host header → reaches the pod.
>
> **Direct path:** Plex (`.241`), Pi-hole (`.242`), and Docker Registry (`.245`) get dedicated MetalLB IPs, bypassing the ingress controller entirely.
---
### Node Inventory
| Node | IP | Role | vCPU | RAM | Proxmox Host | GPU |
|---|---|---|---|---|---|---|
| k8s-cp | 192.168.1.200 | Control Plane | 2 | 3GB | .11 | — |
| k8s-w1 | 192.168.1.201 | Worker | 3 | 4GB | .11 | — |
| k8s-w2 | 192.168.1.202 | Worker | 4 | 6GB | .11 | Intel UHD 630 + NVIDIA GTX 1650 |
| k8s-cp2 | 192.168.1.205 | Control Plane | 2 | 3GB | .8 | — |
| k8s-cp3 | 192.168.1.206 | Control Plane | 2 | 3GB | .8 | — |
| k8s-w4 | 192.168.1.204 | Worker | 4 | 6GB | .8 | — |
**Totals:** .11 → 9c / 13GB (3 VMs) · .8 → 8c / 12GB (3 VMs)
## Cluster Components
| Component | Details |
|---|---|
| OS | Ubuntu 24.04 (cloud-init) |
| Kubernetes | v1.32.x (kubeadm) |
| Container Runtime | containerd 1.7.x |
| CNI | Calico (tigera-operator) |
| HA | kube-vip (ARP, leader election) — VIP `192.168.1.199` |
| Load Balancer | MetalLB — pool `192.168.1.240–250` |
| Ingress | NGINX Ingress Controller (`192.168.1.240`) |
| Storage | Longhorn (replicated), local-path-provisioner |
| GPU (k8s-w2) | Intel QuickSync (iGPU) + NVIDIA GTX 1650 (driver 535, device-plugin v0.14.5) |
| GitOps | ArgoCD — App-of-Apps pattern, repo as source of truth |
| Auto-update | Keel (poll-based image updates) |
| Metrics | metrics-server |
## Networking
| Resource | IP | Purpose |
|---|---|---|
| Control Plane VIP | `192.168.1.199` | HA API server endpoint (kube-vip) |
| Ingress LB | `192.168.1.240` | All `*.homelab.local` / `*.k8s.local` services |
| Plex LB | `192.168.1.241` | Dedicated Plex LoadBalancer (`externalTrafficPolicy: Local`) |
| Synology NAS | `192.168.1.28` | NFS media storage (16TB) |
## Storage
| Class | Provisioner | Use Case |
|---|---|---|
| `longhorn` (default) | Longhorn | Replicated PVCs — app config, databases |
| `local-path` | Rancher local-path | Single-node fast local storage |
NFS mounts on all workers:
- `/mnt/nfs/nas-media` → `192.168.1.28:/data/nas-media` (Synology NAS, 16TB)
- `/mnt/nfs/hdd-int` → `192.168.1.11:/data/hdd-internal` (Proxmox local HDD)
## Services
### Media Streaming
| Service | URL | Node | Notes |
|---|---|---|---|
| Plex | `http://192.168.1.241:32400` / `plex.homelab.local` | k8s-w2 | GPU transcoding (Intel QuickSync + NVIDIA NVENC), dedicated LB IP |
| Jellyfin | `jellyfin.homelab.local` | k8s-w2 | GPU transcoding, Intel QuickSync |
| Tautulli | `tautulli.homelab.local` | k8s-w4 | Plex monitoring |
| Jellystat | `jellystat.homelab.local` | k8s-w4 | Jellyfin monitoring (+ PostgreSQL DB) |
| Pinchflat | `pinchflat.homelab.local` | k8s-w4 | YouTube channel archiver |
| Random Streamer | `streamer.homelab.local` | k8s-w2 | Random video clips live stream (**my own development**) |
### Media Automation (Arr Stack)
| Service | URL | Purpose |
|---|---|---|
| Sonarr | `sonarr.homelab.local` | TV show management |
| Radarr | `radarr.homelab.local` | Movie management |
| Lidarr | `lidarr.homelab.local` | Music management |
| Readarr | `readarr.homelab.local` | Book management |
| Bazarr | `bazarr.homelab.local` | Subtitle management |
| Prowlarr | `prowlarr.homelab.local` | Indexer management |
| Seerr | `seerr.homelab.local` | Media request management |
### Download Clients
| Service | URL | Notes |
|---|---|---|
| SABnzbd | `sabnzbd.homelab.local` | Usenet downloader |
| Downloader | `downloader.homelab.local` | Download client (via Gluetun VPN) |
| FlareSolverr | — | Cloudflare bypass proxy for Prowlarr |
### Cluster Management
| Service | URL | Purpose |
|---|---|---|
| Homepage | `homepage.homelab.local` | Dashboard with service discovery |
| Headlamp | `headlamp.k8s.local` | Kubernetes web UI |
| ArgoCD | `argocd.homelab.local` | GitOps continuous delivery |
| Keel | `keel.k8s.local` | Automated image updates |
| Longhorn | `longhorn.k8s.local` | Storage dashboard |
| Filebrowser | `filebrowser.homelab.local` | NFS file browser |
### DNS Setup
Add to `/etc/hosts` (pointing to MetalLB ingress IP `192.168.1.240`):
```
192.168.1.240 homepage.homelab.local jellyfin.homelab.local sonarr.homelab.local radarr.homelab.local bazarr.homelab.local seerr.homelab.local tautulli.homelab.local sabnzbd.homelab.local readarr.homelab.local prowlarr.homelab.local downloader.homelab.local filebrowser.homelab.local jellystat.homelab.local lidarr.homelab.local plex.homelab.local
192.168.1.240 argocd.homelab.local headlamp.k8s.local longhorn.k8s.local keel.k8s.local
```
## GPU Passthrough (k8s-w2)
k8s-w2 has two GPUs passed through from Proxmox .11 via VFIO:
| GPU | PCI ID | Use Case |
|---|---|---|
| Intel UHD 630 (iGPU) | `8086:3e9b` | Plex/Jellyfin QuickSync transcoding via `/dev/dri` |
| NVIDIA GTX 1650 Mobile | `10de:1f91` | NVENC transcoding, CUDA workloads via `/dev/nvidia*` |
- NVIDIA driver 535 loaded via systemd service (blacklisted from boot to avoid udev crashes)
- `nvidia-container-toolkit` configured with containerd
- `nvidia-device-plugin` v0.14.5 DaemonSet exposes `nvidia.com/gpu` resource
- Plex container runs privileged with both `/dev/dri` and `/dev/nvidia*` mounted
## Quick Start
All provisioning is done via Ansible from your local machine. See `ansible/setup.sh` for initial setup.
```bash
cd ansible
# Run everything end-to-end (all 13 phases)
ansible-playbook playbooks/site.yml
# Or run individual phases:
ansible-playbook playbooks/01-create-vms.yml # Create VMs via cloud-init
ansible-playbook playbooks/02-prepare-nodes.yml # Install containerd, kubeadm, kubelet
ansible-playbook playbooks/03-init-cluster.yml # Bootstrap cluster + Calico + join workers
ansible-playbook playbooks/04-install-monitoring.yml # Prometheus + Grafana
ansible-playbook playbooks/05-install-ingress.yml # MetalLB + NGINX Ingress
ansible-playbook playbooks/06-add-remote-worker.yml # Add 2nd Proxmox host nodes
ansible-playbook playbooks/07-convert-ha.yml # kube-vip + 3 control planes
ansible-playbook playbooks/08-install-longhorn-deps.yml
ansible-playbook playbooks/09-setup-nfs-mounts.yml
ansible-playbook playbooks/10-add-hosts.yml # Local DNS (/etc/hosts)
ansible-playbook playbooks/11-install-argocd.yml # ArgoCD App-of-Apps
ansible-playbook playbooks/12-install-proxmox-glances.yml
ansible-playbook playbooks/13-set-dns.yml # Pi-hole config
```
Access the cluster:
```bash
export KUBECONFIG=~/.kube/config-proxmox
kubectl get nodes
```
## Teardown
```bash
cd ansible
ansible-playbook playbooks/teardown.yml
```
## Troubleshooting
```bash
# Check kubelet logs
ssh vineethvijay@192.168.1.200 "sudo journalctl -u kubelet -f"
# Re-generate join token
ssh vineethvijay@192.168.1.200 "sudo kubeadm token create --print-join-command"
# Reset a node
ansible-playbook ansible/playbooks/remove-node.yml -e "target_node=k8s-w1"
# Check GPU on k8s-w2
ssh vineethvijay@192.168.1.202 "nvidia-smi; ls /dev/dri/"
# Force-delete stuck pods
kubectl delete pod --force --grace-period=0
```