https://github.com/jfreed-dev/turing-rk1-cluster
4-node bare-metal Kubernetes cluster on Turing RK1 (RK3588) with Talos Linux, Longhorn storage, and RKNN NPU support
https://github.com/jfreed-dev/turing-rk1-cluster
arm64 armbian bare-metal edge-computing homelab k3s kubernetes longhorn machine-learning npu rk3588 rknn rockchip talos-linux turing-pi
Last synced: about 2 months ago
JSON representation
4-node bare-metal Kubernetes cluster on Turing RK1 (RK3588) with Talos Linux, Longhorn storage, and RKNN NPU support
- Host: GitHub
- URL: https://github.com/jfreed-dev/turing-rk1-cluster
- Owner: jfreed-dev
- License: apache-2.0
- Created: 2025-12-22T02:03:32.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-01-26T14:24:07.000Z (about 2 months ago)
- Last Synced: 2026-01-26T17:56:04.465Z (about 2 months ago)
- Topics: arm64, armbian, bare-metal, edge-computing, homelab, k3s, kubernetes, longhorn, machine-learning, npu, rk3588, rknn, rockchip, talos-linux, turing-pi
- Language: Shell
- Homepage: https://github.com/jfreed-dev/turing-rk1-cluster
- Size: 345 KB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
# Turing RK1 Kubernetes Cluster
[](https://github.com/jfreed-dev/turing-rk1-cluster/releases)
[](https://github.com/jfreed-dev/turing-rk1-cluster/actions/workflows/lint.yml)
[](https://github.com/jfreed-dev/turing-rk1-cluster/actions/workflows/codeql.yml)
[](https://opensource.org/licenses/MIT)
[](https://www.talos.dev/)
[](https://k3s.io/)
[](https://kubernetes.io/)
A 4-node bare-metal Kubernetes cluster built on Turing RK1 compute modules, supporting both **Talos Linux** and **K3s on Armbian** distributions. Designed for edge computing, AI/ML workloads with NPU acceleration, and distributed storage.
## Choose Your Distribution
| Distribution | Best For | NPU/GPU | Shell Access |
|--------------|----------|---------|--------------|
| **[Talos Linux](docs/INSTALLATION.md)** | Production, Security | No | API only |
| **[K3s on Armbian](docs/INSTALLATION-K3S.md)** | Development, AI/ML | **Yes** | SSH |
See [docs/COMPARISON.md](docs/COMPARISON.md) for detailed feature comparison.
### Quick Start
```bash
# Talos Linux (automated deployment)
./scripts/deploy-talos-cluster.sh prereq # Check prerequisites
./scripts/deploy-talos-cluster.sh deploy # Full deployment
# K3s on Armbian
./scripts/setup-k3s-node.sh # Run on each node
./scripts/deploy-k3s-cluster.sh # Deploy from workstation
# Check cluster status (works with both distributions)
./scripts/talos-cluster-status.sh # Auto-detects and shows health summary
```
> **Note**: This project is under active development. See [CONTRIBUTING.md](CONTRIBUTING.md) for how to get involved.
## Hardware Summary
### Turing Pi 2 Board
| Component | Specification |
|-----------|---------------|
| Form Factor | Mini-ITX |
| Node Slots | 4x CM4/RK1 compatible |
| BMC | Integrated management controller |
| Networking | Gigabit Ethernet per node |
| Storage | NVMe slot per node |
### Turing RK1 Compute Modules (x4)
| Component | Specification |
|-----------|---------------|
| SoC | Rockchip RK3588 |
| CPU | 4x Cortex-A76 @ 2.4GHz + 4x Cortex-A55 @ 1.8GHz |
| RAM | 16GB / 32GB LPDDR4X |
| GPU | Mali-G610 MP4 |
| NPU | 6 TOPS (INT8) - *see limitations* |
| eMMC | 32GB (system disk) |
| NVMe | 500GB Crucial P3 (worker nodes) |
### Cluster Topology
```
┌─────────────────────────────────────────────────────────────┐
│ Turing Pi 2 BMC │
│ 10.10.88.70 │
├─────────────┬─────────────┬─────────────┬───────────────────┤
│ Node 1 │ Node 2 │ Node 3 │ Node 4 │
│ Control Pl. │ Worker │ Worker │ Worker │
│ 10.10.88.73 │ 10.10.88.74 │ 10.10.88.75 │ 10.10.88.76 │
│ 32GB eMMC │ 32GB + 500GB│ 32GB + 500GB│ 32GB + 500GB │
└─────────────┴─────────────┴─────────────┴───────────────────┘
```
### Total Resources
| Resource | Amount |
|----------|--------|
| CPU Cores | 32 (8 per node) |
| RAM | 64-128GB |
| Storage (eMMC) | 128GB |
| Storage (NVMe) | 1.5TB |
| Network | 4x 1Gbps |
---
## Software Stack
### Operating System
| Component | Version | Notes |
|-----------|---------|-------|
| Talos Linux | v1.11.6 | Immutable, API-driven Kubernetes OS |
| Linux Kernel | 6.12.62 | Mainline kernel (ARM64) |
### Kubernetes Components
| Component | Version | Purpose |
|-----------|---------|---------|
| Kubernetes | v1.34.1 | Container orchestration |
| containerd | v2.1.5 | Container runtime |
| etcd | Bundled | Distributed key-value store |
### Storage
| Component | Version | Purpose |
|-----------|---------|---------|
| Longhorn | Latest | Distributed block storage |
| CSI Driver | Longhorn | Persistent volume provisioning |
### Networking
| Component | Version | Purpose |
|-----------|---------|---------|
| Flannel | Bundled | Pod networking (CNI) |
| MetalLB | Latest | LoadBalancer for bare-metal |
| NGINX Ingress | Latest | HTTP/HTTPS ingress controller |
### Monitoring
| Component | Version | Purpose |
|-----------|---------|---------|
| Prometheus | Latest | Metrics collection & alerting |
| Grafana | Latest | Visualization & dashboards |
| Alertmanager | Latest | Alert routing & management |
| Node Exporter | Latest | Host-level metrics |
| kube-state-metrics | Latest | Kubernetes state metrics |
### Management
| Component | Version | Purpose |
|-----------|---------|---------|
| Portainer Agent | v2.33.6 | Remote cluster management |
| talosctl | v1.11.6 | Talos node management |
| kubectl | v1.34.x | Kubernetes CLI |
| Helm | v3.x | Package manager |
---
## Cluster Capabilities
### What This Cluster Can Do
**Container Orchestration**
- Run containerized workloads across 4 nodes
- Automatic pod scheduling and load balancing
- Rolling updates and rollbacks
- Health monitoring and self-healing
**Distributed Storage**
- ~1.5TB distributed storage via Longhorn
- Volume replication across nodes (configurable 1-3 replicas)
- Snapshots and backups
- Dynamic volume provisioning
- High-performance NVMe-backed storage class
**Networking**
- LoadBalancer services via MetalLB (10.10.88.80-89)
- HTTP/HTTPS ingress with NGINX
- TLS termination
- Path and host-based routing
**Edge Computing**
- Low-power ARM64 architecture (~10W per node)
- Compact form factor (Mini-ITX)
- Suitable for remote/edge deployments
**Development & Testing**
- Full Kubernetes API compatibility
- Helm chart deployment
- GitOps-ready
- Multi-architecture image support (arm64)
**AI/ML Workloads (CPU)**
- ARM64-optimized inference
- NumPy, ONNX Runtime, PyTorch (CPU)
- ~12 GFLOPS matrix operations per node
- Distributed training/inference across nodes
**Monitoring & Observability**
- Full cluster metrics via Prometheus
- Pre-configured Grafana dashboards
- Node, pod, and container-level monitoring
- Alerting with Alertmanager
- External Docker host monitoring support
- Longhorn storage metrics integration
---
## Limitations & Known Issues
### NPU Not Available (Talos Only)
| Issue | Status | Details |
|-------|--------|---------|
| RK3588 NPU inaccessible | **Talos: Not Supported** | Talos uses mainline Linux kernel which lacks Rockchip's proprietary RKNPU driver |
| | **K3s/Armbian: Supported** | BSP kernel includes full NPU support |
**Impact:** On Talos, the 6 TOPS NPU in each RK3588 cannot be used for hardware-accelerated AI inference.
**Solutions:**
1. **Use K3s on Armbian** - Full NPU support with RKNN SDK (see [docs/INSTALLATION-K3S.md](docs/INSTALLATION-K3S.md))
2. Use CPU-based inference on Talos (ONNX Runtime, TensorFlow Lite)
3. Wait for mainline NPU driver (in kernel review)
### GPU Not Available (Talos Only)
| Issue | Status | Details |
|-------|--------|---------|
| Mali-G610 GPU inaccessible | **Talos: Not Supported** | No GPU driver/passthrough in Talos |
| | **K3s/Armbian: Supported** | OpenCL and Vulkan available |
**Impact:** On Talos, no GPU acceleration for graphics or compute workloads. K3s on Armbian provides full GPU support.
### Storage Limitations
| Issue | Status | Details |
|-------|--------|---------|
| Control plane has no NVMe | By Design | Only workers have NVMe; CP uses eMMC only |
| Single replica risk | Configurable | Default 3 replicas; 2-replica mode loses redundancy if node fails |
### Network Limitations
| Issue | Status | Details |
|-------|--------|---------|
| No native LoadBalancer | Mitigated | MetalLB provides L2 LoadBalancer functionality |
| Single network interface | Hardware | Each node has only 1x 1Gbps NIC |
### Talos-Specific Considerations
| Issue | Details |
|-------|---------|
| Immutable filesystem | Cannot install packages; must use extensions or containers |
| No SSH access | Nodes managed via `talosctl` API only |
| Privileged namespaces | Many add-ons require `pod-security.kubernetes.io/enforce=privileged` label |
### Known Bugs
| Issue | Status | Workaround |
|-------|--------|------------|
| PodSecurity warnings on deploy | Expected | Label namespaces as privileged |
| MetalLB speaker pods require privileges | Expected | Namespace is pre-labeled |
---
## Network Configuration
### IP Allocation
| Resource | IP Address | Port(s) |
|----------|------------|---------|
| BMC | 10.10.88.70 | 22 (SSH) |
| Control Plane | 10.10.88.73 | 6443 (API) |
| Worker 1 | 10.10.88.74 | - |
| Worker 2 | 10.10.88.75 | - |
| Worker 3 | 10.10.88.76 | - |
| Ingress Controller | 10.10.88.80 | 80, 443 |
| Portainer Agent | 10.10.88.81 | 9001 |
| Available Pool | 10.10.88.82-89 | - |
### Internal Networks
| Network | CIDR | Purpose |
|---------|------|---------|
| Pod Network | 10.244.0.0/16 | Container IPs |
| Service Network | 10.96.0.0/12 | ClusterIP services |
---
## Quick Access
### Management URLs
| Service | URL | Notes |
|---------|-----|-------|
| Kubernetes API | https://10.10.88.73:6443 | Use kubeconfig |
| Grafana | http://grafana.local | Default: admin/admin |
| Prometheus | http://prometheus.local | Metrics & queries |
| Alertmanager | http://alertmanager.local | Alert management |
| Longhorn UI | http://longhorn.local | Storage management |
| Portainer | Your Portainer instance | Connect agent: `10.10.88.81:9001` |
Add to `/etc/hosts`:
```
10.10.88.80 grafana.local prometheus.local alertmanager.local longhorn.local
```
### CLI Access
```bash
# Set environment variables
export TALOSCONFIG=/path/to/cluster-config/talosconfig
export KUBECONFIG=/path/to/cluster-config/kubeconfig
# Verify cluster
kubectl get nodes
talosctl health
```
### BMC Access Setup
The deployment scripts require access to the Turing Pi BMC. Configure credentials by copying the example file:
```bash
cp .env.example .env
# Edit .env with your BMC credentials
```
Required variables in `.env`:
| Variable | Description | Default |
|----------|-------------|---------|
| `TPI_HOSTNAME` | BMC IP address | `10.10.88.70` |
| `TPI_USERNAME` | BMC login username | - |
| `TPI_PASSWORD` | BMC login password | - |
| `USE_LOCAL_TPI` | Use local tpi CLI (1) or SSH to BMC (0) | `1` |
Test BMC connectivity:
```bash
./scripts/wipe-cluster.sh status
```
---
## Documentation Map
### Primary Documentation
| Document | Path | Description |
|----------|------|-------------|
| Docs Index | [docs/README.md](docs/README.md) | Documentation overview |
| **Talos Installation** | [docs/INSTALLATION.md](docs/INSTALLATION.md) | Talos Linux setup guide |
| **K3s Installation** | [docs/INSTALLATION-K3S.md](docs/INSTALLATION-K3S.md) | K3s on Armbian setup guide |
| **Distribution Comparison** | [docs/COMPARISON.md](docs/COMPARISON.md) | Talos vs K3s feature matrix |
| Architecture Diagrams | [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) | Visual cluster architecture (Mermaid) |
| Storage Guide | [docs/STORAGE.md](docs/STORAGE.md) | Longhorn and NVMe configuration |
| Networking Guide | [docs/NETWORKING.md](docs/NETWORKING.md) | MetalLB and Ingress setup |
| Monitoring Guide | [docs/MONITORING.md](docs/MONITORING.md) | Prometheus, Grafana & external monitoring |
| Quick Reference | [docs/QUICKREF.md](docs/QUICKREF.md) | Command cheatsheet |
### Configuration Files
| File | Path | Description |
|------|------|-------------|
| Talos Config | [cluster-config/talosconfig](cluster-config/talosconfig) | Talos CLI configuration |
| Kubeconfig | [cluster-config/kubeconfig](cluster-config/kubeconfig) | Kubernetes access |
| Cluster Secrets | [cluster-config/secrets.yaml](cluster-config/secrets.yaml) | **Keep secure!** |
| MetalLB Config | [cluster-config/metallb-config.yaml](cluster-config/metallb-config.yaml) | IP pool configuration |
| Ingress Config | [cluster-config/ingress-config.yaml](cluster-config/ingress-config.yaml) | Ingress rules |
| Portainer Agent | [cluster-config/portainer-agent.yaml](cluster-config/portainer-agent.yaml) | Agent deployment |
| Prometheus Values | [cluster-config/prometheus-values.yaml](cluster-config/prometheus-values.yaml) | Monitoring stack config |
| External Scrape | [cluster-config/external-scrape-config.yaml](cluster-config/external-scrape-config.yaml) | Docker host monitoring |
### Reference Documentation
| Document | Path | Description |
|----------|------|-------------|
| Cluster Plan | [CLUSTER_PLAN.md](CLUSTER_PLAN.md) | Original deployment plan |
| Talos Schematic | [talos-schematic.yaml](talos-schematic.yaml) | Custom image configuration |
### External Resources
| Resource | URL |
|----------|-----|
| Talos Documentation | https://www.talos.dev/docs/ |
| K3s Documentation | https://docs.k3s.io/ |
| Longhorn Documentation | https://longhorn.io/docs/ |
| Turing Pi Documentation | https://docs.turingpi.com/ |
| MetalLB Documentation | https://metallb.io/ |
| NGINX Ingress | https://kubernetes.github.io/ingress-nginx/ |
| Prometheus Documentation | https://prometheus.io/docs/ |
| Grafana Documentation | https://grafana.com/docs/ |
| RKNN SDK (NPU) | https://github.com/airockchip/rknn-toolkit2 |
| RKLLM (LLM inference) | https://github.com/airockchip/rknn-llm |
---
## Directory Structure
```
turing-rk1-cluster/
├── README.md # This file
├── CLUSTER_PLAN.md # Deployment planning document
├── .env.example # Environment variables template
├── talos-schematic.yaml # Talos image customization
├── cluster-config/ # Cluster configurations
│ ├── talosconfig # Talos CLI config
│ ├── kubeconfig # Kubernetes access
│ ├── secrets.yaml # Cluster secrets (sensitive!)
│ ├── controlplane.yaml # Control plane config (Talos)
│ ├── worker.yaml # Worker config (Talos)
│ ├── metallb-config.yaml # MetalLB IP pool
│ ├── ingress-config.yaml # Ingress rules
│ ├── prometheus-values.yaml # Monitoring stack config
│ ├── external-scrape-config.yaml # External targets
│ └── *.yaml # Other configurations
├── scripts/ # Automation scripts
│ ├── deploy-talos-cluster.sh # Automated Talos deployment
│ ├── talos-cluster-status.sh # Cluster health and status checker
│ ├── setup-k3s-node.sh # Armbian node preparation
│ ├── deploy-k3s-cluster.sh # K3s cluster deployment
│ └── wipe-cluster.sh # Cluster reset/migration tool
├── docs/ # Documentation
│ ├── README.md # Docs index
│ ├── INSTALLATION.md # Talos setup guide
│ ├── INSTALLATION-K3S.md # K3s on Armbian setup guide
│ ├── COMPARISON.md # Talos vs K3s comparison
│ ├── ARCHITECTURE.md # Cluster architecture diagrams
│ ├── STORAGE.md # Storage guide
│ ├── NETWORKING.md # Network guide
│ ├── MONITORING.md # Monitoring guide
│ └── QUICKREF.md # Quick reference
├── images/ # Talos images
│ └── latest/
│ └── metal-arm64.raw # Current Talos image
└── repo/ # Submodules/repos
├── sbc-rockchip/ # Talos Rockchip overlay
├── rknn-toolkit2/ # RKNN SDK v2.3.2 (for K3s)
├── rknn-llm/ # RKLLM v1.2.3 (for K3s)
└── rknn_model_zoo/ # Pre-built models (for K3s)
```
---
## Security Notes
1. **Secrets Protection**: `cluster-config/secrets.yaml` contains cluster credentials. Keep it secure and never commit to public repositories.
2. **BMC Access**: The BMC (10.10.88.70) has full control over all nodes. Restrict network access appropriately.
3. **Privileged Workloads**: Many add-ons require privileged namespace labels. Review security implications before deploying untrusted workloads.
4. **Network Segmentation**: Consider isolating the cluster network (10.10.88.x) from untrusted networks.
---
## Contributing
This is a personal homelab cluster. Configuration files and documentation are provided as-is for reference.
## License
Configuration files and documentation are provided under MIT license. Third-party components retain their original licenses.