Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/frank-at-suse/vsphere_ha_autoscale_cluster
Terraform plan for creating an HA, autoscaled multi-node RKE2 cluster on VMware vSphere
https://github.com/frank-at-suse/vsphere_ha_autoscale_cluster
autoscaler high-availability kubernetes linux rancher-server rancher2 rke2 terraform vmware-vsphere
Last synced: about 6 hours ago
JSON representation
Terraform plan for creating an HA, autoscaled multi-node RKE2 cluster on VMware vSphere
- Host: GitHub
- URL: https://github.com/frank-at-suse/vsphere_ha_autoscale_cluster
- Owner: frank-at-suse
- License: mpl-2.0
- Created: 2022-09-08T21:30:22.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2023-10-06T16:01:14.000Z (about 1 year ago)
- Last Synced: 2023-10-07T16:36:21.652Z (about 1 year ago)
- Topics: autoscaler, high-availability, kubernetes, linux, rancher-server, rancher2, rke2, terraform, vmware-vsphere
- Language: HCL
- Homepage:
- Size: 52.7 KB
- Stars: 8
- Watchers: 2
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RKE2 Cluster with Autoscaling & API Server HA
![Rancher](https://img.shields.io/badge/rancher-%230075A8.svg?style=for-the-badge&logo=rancher&logoColor=white) ![Terraform](https://img.shields.io/badge/terraform-%235835CC.svg?style=for-the-badge&logo=terraform&logoColor=white) ![Kubernetes](https://img.shields.io/badge/kubernetes-%23326ce5.svg?style=for-the-badge&logo=kubernetes&logoColor=white)
## Reason for Being
This Terraform plan is for creating a multi-node RKE2 cluster in vSphere with machine pool autoscaling via [upstream K8s Cluster Autoscaler](https://github.com/kubernetes/autoscaler) & API Server HA via a [kube-vip](https://kube-vip.io/) DaemonSet manifest - both of these are common asks and bring our cluster some "cloud-provider-like" behaviors in the comfort of our own datacenter.
## Environment Prerequisites
- Functional Rancher Management Server with vSphere Cloud Credential
- vCenter >= 7.x and credentials with appropriate permissions (see )
- Virtual Machine Hardware Compatibility at Version >= 15
- Create the following in the files/ directory:| NAME | PURPOSE |
|:-----|:--------|
| .rancher-api-url | URL for Rancher Management Server |
| .rancher-bearer-token | API bearer token generated via Rancher UI |
| .ssh-public-key | SSH public key for additional OS user |- Since this plan leverages BGP for K8s Control Plane load balancing, a router capable of BGP is required. For lab/dev/test use, a small single-CPU Linux VM running [BIRD v2 daemon](https://bird.network.cz/?get_doc&f=bird.html&v=20) (`sudo apt install bird2`) with the following config would suffice:
```bash
protocol bgp kubevip {
description "kube-vip for Cluster CP";
local as 64513;
neighbor range as ;
graceful restart;
ipv4 {
import filter {accept;};
export filter {reject;};
};
dynamic name "kubeVIP";
}
```## Caveats
The `cluster_autoscaler.tf` plan includes the following values in `ExtraArgs:`
```terraform
skip-nodes-with-local-storage: false
skip-nodes-with-system-pods: false
```Those exist here to make the autoscaler logic more easily demonstrable and should be **_used with caution_** in production or any other environment you care about, as they could incur data loss or workload instability.
---
The `lifecycle` block in `cluster.tf` is somewhat fragile:
```terraform
lifecycle {
ignore_changes = [
rke_config[0].machine_pools[1].quantity
]
}
```Starting from the [0] value, Terraform processes indices lexicographically - the "worker" pool is `machine_pools[1]` and "ctl_plane" pool is `machine_pools[0]` for no other reason than "worker" comes after "ctl_plane" from a dictionary perspective. Due to this, if the "ctl_plane" pool were to be renamed something like "x_ctl_plane", the incorrect machine pool would occupy the `machine_pools[1]` index, causing undesired behavior. To prevent this, basic variable validation is in place that forces MachinePool names to begin with `ctl-plane` and `worker` otherwise the below error will be thrown:
```bash
Err: MachinePool names must begin with 'ctl-plane' for Control Plane Node Pool & 'worker' for Autoscaling Worker Node
Pool.
```## To Run
```bash
terraform apply
```Node pool min/max values are annotations that can be adjusted with the `rancher_env.autoscale_annotations` variable. Changing these values on a live cluster will not trigger a redeploy. Any nodes in the autoscaled pool selected for scale down and/or deletion will have a Taint applied that is visible in the Rancher UI:
![autoscaler](https://user-images.githubusercontent.com/88675306/189248687-4b949567-ebd0-460e-a42e-d13dc1706410.png)
## Tested Versions
| SOFTWARE | VERSION | DOCS |
|:---------|:--------|:-----|
| K8s Cluster Autoscaler | 1.26.2 | |
| kube-vip | 0.6.2 | |
| Rancher Server | 2.7.6 | |
| Rancher Terraform Provider | 3.1.1 | |
| RKE2 | 1.26.8+rke2r1 | |
| Terraform | 1.4.6 | |
| vSphere | 8.0.1.00300 | |