Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/frank-at-suse/vsphere_rancher_cluster

Terraform plan for creating a hardened multi-node RKE2 cluster on VMware vSphere
https://github.com/frank-at-suse/vsphere_rancher_cluster

cis-benchmark kube-vip kubernetes linux rancher-server rancher2 rke2 terraform vmware-vsphere

Last synced: about 3 hours ago
JSON representation

Terraform plan for creating a hardened multi-node RKE2 cluster on VMware vSphere

Awesome Lists containing this project

README

        

# RKE2 Cluster with vSphere CPI/CSI & kube-vip

![Rancher](https://img.shields.io/badge/rancher-%230075A8.svg?style=for-the-badge&logo=rancher&logoColor=white) ![Terraform](https://img.shields.io/badge/terraform-%235835CC.svg?style=for-the-badge&logo=terraform&logoColor=white) ![Kubernetes](https://img.shields.io/badge/kubernetes-%23326ce5.svg?style=for-the-badge&logo=kubernetes&logoColor=white)

## Reason for Being

This Terraform plan is for creating a multi-node CIS Benchmarked RKE2 cluster with vSphere CPI/CSI & kube-vip installed and configured. RKE2's NGiNX Ingress Controller is also exposed as a LoadBalancer service to work in concert with kube-vip. Along with those quality-of-life additions, this cluster plan takes the standard RKE2 security posture a couple of steps further by way of installing with [CIS 1.23](https://docs.rke2.io/security/cis_self_assessment123) Profile enabled, using Calico's Wireguard backend for encrypting pod-to-pod communication, & enforcing the use TLS 1.3 across Control Plane components.

There is a lot of HereDoc in the `rke_config` section of `cluster.tf` so that it's easier to see what's going on - you'll probably want to put this info in a template file to keep the plan a bit neater than what's seen here.

Some operating systems will run containerd within the "systemd" control group and the Kubelet within the "cgroupfs" control group - this plan passes to the Kubelet a `--cgroup-driver=systemd` argument to ensure that there will be only a single cgroup manager running - better aligining the cluster with upstream K8s reccomendations ( see: ).

## Static IP Addressing

Static IPs _can_ be implemented if needed. Firstly, a [Network Protocol Profile needs to be created in vSphere](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.networking.doc/GUID-D24DBAA0-68BD-49B9-9744-C06AE754972A.html). After the profile is created, two parts of this Terraform plan need to be changed: `cloud-init` and the `rancher2_machine_config_v2` resource in `cluster.tf`.

1. A script must be added with `write_files` and executed via `runcmd` in `cloud-init`. This script gathers instance metadata, via vmtools, and then applies it (the below example uses Netplan. Your OS, however, may use something different):

```yaml
- content: |
#!/bin/bash
vmtoolsd --cmd 'info-get guestinfo.ovfEnv' > /tmp/ovfenv
IPAddress=$(sed -n 's/.*Property oe:key="guestinfo.interface.0.ip.0.address" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)
SubnetMask=$(sed -n 's/.*Property oe:key="guestinfo.interface.0.ip.0.netmask" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)
Gateway=$(sed -n 's/.*Property oe:key="guestinfo.interface.0.route.0.gateway" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)
DNS=$(sed -n 's/.*Property oe:key="guestinfo.dns.servers" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)

cat > /etc/netplan/01-netcfg.yaml <",
"guestinfo.interface.0.ip.0.netmask=$${netmask:}",
"guestinfo.interface.0.route.0.gateway=$${gateway:}",
"guestinfo.dns.servers=$${dns:}",
] : null
vapp_transport = each.key == "ctl_plane" ? "com.vmware.guestInfo" : null
```

Using static IPs comes with some small caveats:

- In leu of "traditional" `cloud-init` logic to handle OS updates/upgrades & package installs:

```yaml
package_reboot_if_required: true
package_update: true
package_upgrade: true
packages:
-
```

Scripting would need to be introduced to take care of this later on in the `cloud-init` process, if desired (i.e. a `write_file` using `defer: true`). Since `runcmd` happens later in the `cloud-init` process, the node would not have an IP available to successfully complete any `package*` logic requiring network access.

## Environment Prerequisites

- Functional Rancher Management Server with vSphere Cloud Credential
- vCenter >= 7.x and credentials with appropriate permissions (see [vSphere Permissions section](./README.md#vsphere-permissions))
- Virtual Machine Hardware Compatibility at Version >= 15
- Create the following in the files/ directory:

| NAME | PURPOSE |
|:-----|:--------|
| .rancher-api-url | URL for Rancher Management Server |
| .rancher-bearer-token | API bearer token generated via Rancher UI |
| .ssh-public-key | SSH public key for additional OS user |
| .vsphere-passwd | Password associated with vSphere CPI/CSI credential |

## vSphere Permissions

For required vSphere CPI & CSI account permissions see [HERE](https://github.com/rancher/barn/tree/main/Walkthroughs/vSphere/Permissions).

## Caveats

- vSphere CSI volumes are **RWO only** unless using vSAN Datastore
- Using Wireguard as CNI backend comes at a performance penalty (see )
- kube-vip is configured in L2 mode, so **_ALL_** LoadBalancer service traffic goes **_only_** to the node that has the VIP assigned, which is not suitable for production

## To Run

```bash
terraform apply
```

## Tested Versions

| SOFTWARE | VERSION | DOCS |
|:---------|:--------|:-----|
| kube-vip | 0.6.2 | |
| Rancher Server | 2.7.6 | |
| Rancher Terraform Provider | 3.1.1 | |
| RKE2 | 1.26.8+rke2r1 | |
| Terraform | 1.4.6 | |
| vSphere | 8.0.1.00300 | |