An open API service indexing awesome lists of open source software.

https://github.com/hcloud-k8s/terraform-hcloud-kubernetes

Terraform Module to Deploy a Highly Available, Production-Ready Talos Kubernetes Cluster on Hetzner Cloud
https://github.com/hcloud-k8s/terraform-hcloud-kubernetes

cloud hcloud hetzner k8s kube kubernetes linux talos terraform

Last synced: 19 days ago
JSON representation

Terraform Module to Deploy a Highly Available, Production-Ready Talos Kubernetes Cluster on Hetzner Cloud

Awesome Lists containing this project

README

          

logo

Hcloud Kubernetes


Terraform Module to deploy Kubernetes on Hetzner Cloud!



last update


last update


stars


forks


OpenTofu Module


Terraform Registry downloads


open issues


open pull requests


discussions


license


# πŸ“” Overview
- [🌟 About the Project](#-about-the-project)
- [πŸš€ Getting Started](#-getting-started)
- [βš™οΈ Advanced Configuration](#%EF%B8%8F-advanced-configuration)
- [♻️ Lifecycle](#%EF%B8%8F-lifecycle)
- [❀️ Support this Project](#%EF%B8%8F-support-this-project)
- [πŸ‘‹ Community](#-community)
- [πŸ“Ž Project Info](#-project-info)

## 🌟 About the Project
Hcloud Kubernetes is a Terraform module for deploying a fully declarative, managed Kubernetes cluster on Hetzner Cloud. It utilizes Talos, a secure, immutable, and minimal operating system specifically designed for Kubernetes, featuring a streamlined architecture with only a handful of binaries and shared libraries. Just enough to run containerd and a small set of system services.

This project is committed to production-grade configuration and lifecycle management, ensuring all components are set up for high availability. It includes a curated selection of widely used and officially recognized Kubernetes components. If you encounter any issues, suboptimal settings, or missing elements, please file an [issue](https://github.com/hcloud-k8s/terraform-hcloud-kubernetes/issues) to help us improve this project.

> [!TIP]
> If you don’t have a Hetzner account yet, you can use this [Hetzner Cloud Referral Link](https://hetzner.cloud/?ref=GMylKeDmqtsD) to claim a €20 credit and support this project at the same time.

### ✨ Features

Provision a highly available and secure Kubernetes cluster on Hetzner Cloud, defined by these key features:

* **Immutable Infrastructure:** Uses Talos Linux to deliver a fully declarative, immutable Kubernetes cluster.
* **Multi-Architecture:** Supports deployment on both **AMD64** and **ARM64** instances, with automated image builds.
* **High Availability:** Provides high availability across control plane and worker components for reliable operation.
* **Autoscaling:** Supports automatic scaling of both **Nodes** and **Pods** to seamlessly handle dynamic workloads.
* **Quick Start:** Optional **Gateway API**, **Cert Manager**, and **Longhorn** integrations for faster app deployment.
* **Dual-Stack:** Load balancers provide native **IPv4** and **IPv6** connectivity with **PROXY Protocol** support.
* **Isolated Network:** Keeps all internal cluster traffic confined to an isolated, private Hetzner Cloud Network.
* **Security:** Security-first architecture with perimeter firewalls and encryption for data in transit and at rest.

### πŸ“¦ Components
This project bundles essential Kubernetes components, preconfigured for seamless operation on Hetzner Cloud:
-
Talos Cloud Controller Manager
Talos Cloud Controller Manager (CCM)

Manages node resources by updating with cloud metadata, handling lifecycle deletions, and automatically approving node CSRs.
-
Talos Backup
Talos Backup

Automates etcd snapshots and S3 storage for backup in Talos Linux-based Kubernetes clusters.
-
Hcloud Cloud Controller Manager
Hcloud Cloud Controller Manager (CCM)

Manages the integration of Kubernetes clusters with Hetzner Cloud services, ensuring the update of node data, private network traffic control, and load balancer setup.
-
Hcloud Container Storage Interface
Hcloud Container Storage Interface (CSI)

Provides persistent storage for Kubernetes using Hetzner Cloud Volumes, supporting encryption and dynamic provisioning.
-
Longhorn
Longhorn

Distributed block storage for Kubernetes, providing high availability, snapshots, and automatic replica rebuilding for easy persistent volume management.
-
Cilium Container Network Interface
Cilium Container Network Interface (CNI)

A high performance CNI plugin that enhances and secures network connectivity and observability for container workloads through the use of eBPF technology in Linux kernels.
-
Cilium Gateway API
Cilium Gateway API

Cilium Gateway API implements the Kubernetes Gateway API using eBPF for traffic steering and policy enforcement, with Envoy providing Layer 7 proxying for HTTP and TLS routing.
-
Cert Manager
Cert Manager

Automates the management of certificates in Kubernetes, handling the issuance and renewal of certificates from various sources like Let's Encrypt, and ensures certificates are valid and updated.
-
Cert Manager Webhook Hetzner
Cert Manager Webhook Hetzner

Adds Hetzner DNS support for Cert Manager ACME DNS-01 challenges, including wildcard certificates and certificates that do not depend on public HTTP routing.
-
Cluster Autoscaler
Cluster Autoscaler

Dynamically adjusts Kubernetes cluster size based on resource demands and node utilization, scaling nodes in or out to optimize cost and performance.
-
Metrics Server
Metrics Server

Collects and provides container resource metrics for Kubernetes, enabling features like autoscaling by interacting with Horizontal and Vertical Pod Autoscalers.

### πŸ›‘οΈ Security
Talos Linux is a secure, minimal, and immutable OS for Kubernetes, removing SSH and shell access to reduce attack surfaces. Managed through a secure API with mTLS, Talos prevents configuration drift, enhancing both security and predictability. It follows [NIST](https://www.nist.gov/publications/application-container-security-guide) and [CIS](https://www.cisecurity.org/benchmark/kubernetes) hardening standards, operates in memory, and is built to support modern, production-grade Kubernetes environments.

**Perimeter Security:** External access to cluster nodes is controlled and restricted using [Hetzner Cloud Firewall](https://docs.hetzner.com/cloud/firewalls/).

**Network Policy:** Internal cluster traffic can be governed by Kubernetes Network Policies using [Cilium CNI](https://docs.cilium.io/en/stable/network/kubernetes/policy/).

**Encryption in Transit:** Pod network traffic is transparently encrypted by Cilium using [WireGuard](https://docs.cilium.io/en/latest/security/network/encryption-wireguard/) by default, with optional support for [IPsec](https://docs.cilium.io/en/latest/security/network/encryption-ipsec/).

**Encryption at Rest:** The [STATE](https://www.talos.dev/latest/learn-more/architecture/#file-system-partitions) and [EPHEMERAL](https://www.talos.dev/latest/learn-more/architecture/#file-system-partitions) partitions are encrypted by default using [Talos Disk Encryption](https://www.talos.dev/latest/talos-guides/configuration/disk-encryption/) with LUKS2. Each node is secured with an individual encryption key derived from its unique `nodeID`.

## πŸš€ Getting Started

### βœ… Prerequisites

- [terraform](https://developer.hashicorp.com/terraform/install) or [tofu](https://opentofu.org/docs/intro/install/) to deploy the Cluster
- [packer](https://developer.hashicorp.com/packer/install) to upload Talos Images
- [curl](https://curl.se) and [jq](https://jqlang.org/download/) for API Communication
- [talosctl](https://www.talos.dev/latest/talos-guides/install/talosctl) to control the Talos Cluster
- [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl) to control Kubernetes (optional)

> [!IMPORTANT]
> Keep the CLI tools up to date. Ensure that `talosctl` matches your Talos version for compatibility, especially before a Talos upgrade.

### 🎯 Installation

Create `kubernetes.tf` file with the module configuration:
```hcl
module "kubernetes" {
source = "hcloud-k8s/kubernetes/hcloud"
version = ""

cluster_name = "k8s"
hcloud_token = ""

# Export configs for talosctl and kubectl (optional)
cluster_kubeconfig_path = "kubeconfig"
cluster_talosconfig_path = "talosconfig"

# Enable Cilium Gateway API and Cert Manager (optional)
cert_manager_enabled = true
cilium_gateway_api_enabled = true

control_plane_nodepools = [
{ name = "control", type = "cpx22", location = "nbg1", count = 3 }
]
worker_nodepools = [
{ name = "worker", type = "cpx22", location = "nbg1", count = 3 }
]
}
```
> [!NOTE]
> Each Control Plane node requires at least 4GB of memory and each Worker node at least 2GB. For High-Availability (HA), at least 3 Control Plane nodes and 3 Worker nodes are required.

Initialize and deploy the cluster:

**Terraform:**
```sh
terraform init -upgrade
terraform apply
```
**OpenTofu:**
```sh
tofu init -upgrade
tofu apply
```

### πŸ”‘ Cluster Access

Set config file locations:
```sh
export TALOSCONFIG=talosconfig
export KUBECONFIG=kubeconfig
```

Display cluster nodes:
```sh
talosctl get member
kubectl get nodes -o wide
```

Display all pods:
```sh
kubectl get pods -A
```

For more detailed information and examples, please visit:
- [Talos CLI Documentation](https://www.talos.dev/latest/reference/cli/)
- [Kubernetes CLI Documentation](https://kubernetes.io/docs/reference/kubectl/introduction/)

### πŸ’₯ Teardown
To destroy the cluster, first disable the delete protection by setting:
```hcl
cluster_delete_protection = false
```

Apply this change before proceeding. Once the delete protection is disabled, you can teardown the cluster.

**Terraform:**
```sh
terraform destroy
```
**OpenTofu:**
```sh
tofu destroy
```

## βš™οΈ Advanced Configuration

Cluster Access

#### Public Cluster Access
By default, the cluster is accessible over the public internet. The firewall is automatically configured to use the IPv4 address and /64 IPv6 CIDR of the machine running this module. To disable this automatic configuration, set the following variables to `false`:

```hcl
firewall_use_current_ipv4 = false
firewall_use_current_ipv6 = false
```

To manually specify source networks for the Talos API and Kube API, configure the `firewall_api_source` variable as follows:
```hcl
firewall_api_source = [
"1.2.3.0/32",
"1:2:3::/64"
]
```
This allows explicit control over which networks can access your APIs, overriding the default behavior when set.

#### Internal Cluster Access
If your internal network is routed and accessible, you can directly access the cluster using internal IPs by setting:
```hcl
cluster_access = "private"
```

For integrating Talos nodes with an internal network, configure a default route (`0.0.0.0/0`) in the Hetzner Network to point to your router or gateway. Additionally, add specific routes on the Talos nodes to encompass your entire network CIDR:
```hcl
talos_extra_routes = ["10.0.0.0/8"]

# Optionally, disable NAT for your globally routed CIDR
network_native_routing_ipv4_cidr = "10.0.0.0/8"

# Optionally, use an existing Network
hcloud_network_id = 123456789
```
This setup ensures that the Talos nodes can route traffic appropriately across your internal network.

#### Access to Kubernetes API

Optionally, a hostname can be configured to direct access to the Kubernetes API through a node IP, load balancer, or Virtual IP (VIP):
```hcl
kube_api_hostname = "kube-api.example.com"
```

##### Access from Public Internet
For accessing the Kubernetes API from the public internet, choose one of the following options based on your needs:
1. **Use single Control Plane IP (default):**

By default the IP address of a single Control Plane node is used to access the Kube API.
2. **Use a Load Balancer:**

Deploy a load balancer to manage API traffic, enhancing availability and load distribution.
```hcl
kube_api_load_balancer_enabled = true
```
3. **Use a Virtual IP (Floating IP):**

A Floating IP is configured to automatically move between control plane nodes in case of an outage, ensuring continuous access to the Kubernetes API.
```hcl
control_plane_public_vip_ipv4_enabled = true

# Optionally, specify an existing Floating IP
control_plane_public_vip_ipv4_id = 123456789
```

##### Access from Internal Network
When accessing the Kubernetes API via an internal network, an internal Virtual IP (Alias IP) is utilized by default to route API requests within the network. This feature can be disabled with the following configuration:
```hcl
control_plane_private_vip_ipv4_enabled = false
```

To enhance internal availability, a load balancer can be used:
```hcl
kube_api_load_balancer_enabled = true
```

This setup ensures secure and flexible access to the Kubernetes API, accommodating different networking environments.

Cluster Autoscaler
The Cluster Autoscaler dynamically adjusts the number of nodes in a Kubernetes cluster based on the demand, ensuring that there are enough nodes to run all pods and no unneeded nodes when the workload decreases.

Example `kubernetes.tf` snippet:
```hcl
# Configuration for cluster autoscaler node pools
cluster_autoscaler_nodepools = [
{
name = "autoscaler"
type = "cpx22"
location = "nbg1"
min = 0
max = 6
labels = { "autoscaler-node" = "true" }
taints = [ "autoscaler-node=true:NoExecute" ]
}
]
```

Optionally, pass additional [Helm values](https://github.com/kubernetes/autoscaler/blob/master/charts/cluster-autoscaler/values.yaml) to the cluster autoscaler configuration:
```hcl
cluster_autoscaler_helm_values = {
extraArgs = {
enforce-node-group-min-size = true
scale-down-delay-after-add = "45m"
scale-down-delay-after-delete = "4m"
scale-down-unneeded-time = "5m"
}
}
```

##### Talos Upgrades and Configuration Changes
Cluster Autoscaler does not support upgrading nodes or changing their configuration, as its primary purpose is to manage short-lived nodes that handle load peaks. If you require long-lived autoscaled nodes, you can upgrade them manually using `talosctl` or use this Terraform module, which supports discovery of autoscaled nodes and manages their upgrades and configuration changes.

To enable this feature, add the following to your configuration:
```hcl
cluster_autoscaler_discovery_enabled = true
```

Please note that errors may occur if a node pool has been scaled down recently, as Talos caches absent nodes for up to [30 minutes](https://www.talos.dev/latest/introduction/troubleshooting/#removed-members-are-still-present). You can pause automatic scaling by stopping the Cluster Autoscaler pods:
```sh
kubectl -n kube-system scale deployment cluster-autoscaler-hetzner-cluster-autoscaler --replicas=0
```

Cilium Advanced Configuration

#### Cilium CIDR Policy Match Mode

By default, when using K8s NetworkPolicies CIDR-based selectors in Cilium do not match in-cluster entities (pods or nodes). This is however required when using NetworkPolicies that control traffic to the kube-apiserver pods as these use the Talos node IPs in our setup.
Cilium's policy engine can be directed to select nodes by CIDR / ipBlock. This requires you to add the following to your configuration:

```hcl
cilium_policy_cidr_match_mode = "nodes"
```

It is safe to toggle this option on a running cluster, and toggling the option affects neither upgrades nor downgrades
More information can be found in [the Cilium docs.](https://docs.cilium.io/en/stable/security/policy/language/#selecting-nodes-with-cidr-ipblock)

#### Cilium Transparent Encryption

This module enables [Cilium Transparent Encryption](https://cilium.io/use-cases/transparent-encryption/) feature by default.

All pod network traffic is encrypted using WireGuard (Default) or protocols, includes automatic key rotation and efficient in-kernel encryption, covering all traffic types.

πŸ’‘ Although WireGuard is the default option, Hetzner Cloud VMs supports AES-NI instruction set, making IPSec encryption more CPU-efficient compared to WireGuard. Consider enabling IPSec for CPU savings through hardware acceleration.

IPSec mode supports RFC4106 AES-GCM encryption with 128, 192 and 256 bits key sizes.

**⚠️ IPSec encryption has the following limitations:**

- No transparent encryption when chaining Cilium with other CNI plugins
- Host Policies not supported with IPSec
- Incompatible with BPF Host Routing (automatically disabled on switch)
- IPv6-only clusters not supported
- Maximum 65,535 nodes per cluster/clustermesh
- Single CPU core limitation per IPSec tunnel may affect high-throughput scenarios

*Source: [Cilium Documentation](https://docs.cilium.io/en/stable/security/network/encryption-ipsec/#limitations)*

Example `kubernetes.tf` configuration:

```hcl
cilium_encryption_enabled = true # Default true
cilium_encryption_type = "wireguard" # wireguard (Default) | ipsec
cilium_ipsec_algorithm = "rfc4106(gcm(aes))" # IPSec AES key algorithm (Default rfc4106(gcm(aes)))
cilium_ipsec_key_size = 256 # IPSec AES key size (Default 256)
cilium_ipsec_key_id = 1 # IPSec key ID (Default 1)
```

##### IPSec Key Rotation

Keys automatically rotate when `cilium_ipsec_key_id` is incremented (1-15 range, resets to 1 after 15).

Egress Gateway

Cilium offers an Egress Gateway to ensure network compatibility with legacy systems and firewalls requiring fixed IPs. The use of Cilium Egress Gateway does not provide high availability and increases latency due to extra network hops and tunneling. Consider this configuration only as a last resort.

Example `kubernetes.tf` snippet:
```hcl
# Enable Cilium Egress Gateway
cilium_egress_gateway_enabled = true

# Define worker nodepools including an egress-specific node pool
worker_nodepools = [
# ... (other node pool configurations)
{
name = "egress"
type = "cpx22"
location = "nbg1"
labels = { "egress-node" = "true" }
taints = [ "egress-node=true:NoSchedule" ]
}
]
```

Example Egress Gateway Policy:
```yml
apiVersion: cilium.io/v2
kind: CiliumEgressGatewayPolicy
metadata:
name: sample-egress-policy
spec:
selectors:
- podSelector:
matchLabels:
io.kubernetes.pod.namespace: sample-namespace
app: sample-app

destinationCIDRs:
- "0.0.0.0/0"

egressGateway:
nodeSelector:
matchLabels:
egress-node: "true"
```

Please visit the Cilium [documentation](https://docs.cilium.io/en/stable/network/egress-gateway) for more details.

Firewall Configuration
By default, a firewall is configured that can be extended with custom rules. If no egress rules are configured, outbound traffic remains unrestricted. However, inbound traffic is always restricted to mitigate the risk of exposing Talos nodes to the public internet, which could pose a serious security vulnerability.

Each rule is defined with the following properties:
- `description`: A brief description of the rule.
- `direction`: The direction of traffic (`in` for inbound, `out` for outbound).
- `source_ips`: A list of source IP addresses for outbound rules.
- `destination_ips`: A list of destination IP addresses for inbound rules.
- `protocol`: The protocol used (valid options: `tcp`, `udp`, `icmp`, `gre`, `esp`).
- `port`: The port number (required for `tcp` and `udp` protocols, must not be specified for `icmp`, `gre`, and `esp`).

Example `kubernetes.tf` snippet:
```hcl
firewall_extra_rules = [
{
description = "Custom UDP Rule"
direction = "in"
source_ips = ["0.0.0.0/0", "::/0"]
protocol = "udp"
port = "12345"
},
{
description = "Custom TCP Rule"
direction = "in"
source_ips = ["1.2.3.4", "1:2:3:4::"]
protocol = "tcp"
port = "8080-9000"
},
{
description = "Allow ICMP"
direction = "in"
source_ips = ["0.0.0.0/0", "::/0"]
protocol = "icmp"
}
]
```

For access to Talos and the Kubernetes API, please refer to the [Cluster Access](#public-cluster-access) configuration section.

Gateway API

Kubernetes Gateway API is the modern replacement for Kubernetes Ingress. It fixes many Ingress limitations by offering a richer, more consistent model for traffic management, and it's designed to support multiple [Gateway API implementations](https://gateway-api.sigs.k8s.io/implementations/#conformant) in parallel.

This module installs the Gateway API CRDs by default and deploys Cert Manager with Gateway API support enabled.

### Example with Cilium Gateway API and Cert Manager TLS Certificate

To use Cilium's Gateway API implementation, configure:
```hcl
cilium_gateway_api_enabled = true
```

If Cert Manager and Cilium weren't initially deployed/configured with Gateway API support enabled, you may need to restart their controllers to pick up the new configuration:
```shell
kubectl -n cert-manager rollout restart deployment
kubectl -n kube-system rollout restart deployment/cilium-operator
```

#### Create a Cert Manager `Issuer` (Let's Encrypt HTTP-01 via Gateway API)

`Issuer` configures Cert Manager to request TLS certificates from Let's Encrypt using the ACME `HTTP-01` challenge. The important part is the `gatewayHTTPRoute` solver: when Cert Manager needs to prove domain ownership, it will temporarily create/attach an HTTPRoute under the referenced `Gateway` and serve the `/.well-known/acme-challenge/...` response there.

Replace the placeholder email address, and note that `privateKeySecretRef` is the secret where Cert Manager stores your ACME account key (not the issued certificate).

```yaml
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt-http01
namespace: default
spec:
acme:
email:
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-http01-key
solvers:
- http01:
gatewayHTTPRoute:
parentRefs:
- name: cilium-gateway
namespace: default
kind: Gateway
```

#### Optional: Create a Cert Manager `ClusterIssuer` (Let's Encrypt DNS-01 via Hetzner DNS)

To request wildcard certificates or certificates that do not depend on public HTTP routing, enable the Hetzner DNS webhook:

```hcl
cert_manager_enabled = true
cert_manager_webhook_hetzner_enabled = true
```

Your DNS zone must be managed by Hetzner DNS. Create a Secret containing a Hetzner API token with permission to create and remove DNS records. For `ClusterIssuer` resources, place the Secret in the cert-manager cluster resource namespace, which defaults to `cert-manager`.

```yaml
apiVersion: v1
kind: Secret
metadata:
name: hetzner
namespace: cert-manager
stringData:
token:
```

```yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-dns01
spec:
acme:
email:
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-dns01-key
solvers:
- dns01:
webhook:
groupName: acme.hetzner.com
solverName: hetzner
config:
tokenSecretKeyRef:
name: hetzner
key: token
```

When using this `ClusterIssuer` with the Gateway shim, replace `cert-manager.io/issuer: letsencrypt-http01` with `cert-manager.io/cluster-issuer: letsencrypt-dns01`.

#### Create the `Gateway` (Cilium GatewayClass + Hetzner LB + Cert Manager integration)

A `Gateway` defines the external entry point for traffic. With `gatewayClassName: cilium`, the resource is reconciled by the Cilium Gateway controller. The `infrastructure.annotations` are passed through as Hetzner-specific load balancer settings (interpreted by the Hetzner Cloud Controller Manager) to control how the Hetzner Cloud LB is created and configured. See: [Load Balancer Annotations](https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/main/docs/reference/load_balancer_annotations.md)

The `cert-manager.io/issuer: letsencrypt-http01` annotation is used by Cert Manager's Gateway shim so it knows which Issuer to use when populating the TLS secret referenced by the listener.

`tls.certificateRefs` points at the Kubernetes Secret that will contain the issued certificate and private key (`example-com-tls`). Cert Manager will keep that secret up to date, and the Gateway will use it to terminate TLS.

```yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: cilium-gateway
namespace: default
annotations:
cert-manager.io/issuer: letsencrypt-http01
spec:
gatewayClassName: cilium
infrastructure:
annotations:
load-balancer.hetzner.cloud/name: "cilium-gateway-nbg1"
load-balancer.hetzner.cloud/location: "nbg1"
load-balancer.hetzner.cloud/uses-proxyprotocol: "true"
listeners:
- name: https
hostname: example.com
port: 443
protocol: HTTPS
allowedRoutes:
namespaces:
from: All
tls:
mode: Terminate
certificateRefs:
- name: example-com-tls
kind: Secret
group: ""
- name: http
hostname: example.com
port: 80
protocol: HTTP
allowedRoutes:
namespaces:
from: All
```

#### Create an `HTTPRoute` (bind hostname + route traffic to your service)

Finally, the `HTTPRoute` attaches to the Gateway and defines routing rules for `example.com`. In this example it forwards all matching requests to a Kubernetes Service called `example-service` on port `8080`.

```yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: example-app
namespace: default
spec:
hostnames:
- example.com
parentRefs:
- name: cilium-gateway
namespace: default
rules:
- backendRefs:
- name: example-service
port: 8080
```

> ⚠️ **Important:** When using PROXY protocol with Cilium Gateway API (enabled by default), external IPv6 connections will not work due to a bug in Cilium’s Gateway API implementation: [https://github.com/cilium/cilium/issues/42950](https://github.com/cilium/cilium/issues/42950)
>
> If you need IPv6, disable PROXY protocol by adding the `load-balancer.hetzner.cloud/uses-proxyprotocol: "false"` infrastructure annotation and setting this module config:
> ```
> cilium_gateway_api_proxy_protocol_enabled = false
> ```
> After applying the module config, you may need to restart Cilium to pick up the change:
> ```shell
> kubectl -n kube-system rollout restart deployment/cilium-operator
> ```

Ingress Load Balancer

The ingress controller uses a default load balancer service to manage external traffic. For geo-redundancy and high availability, `ingress_load_balancer_pools` can be configured as an alternative, replacing the default load balancer with the specified pool of load balancers.

##### Configuring Load Balancer Pools
To replace the default load balancer, use `ingress_load_balancer_pools` in the Terraform configuration. This setup ensures high availability and geo-redundancy by distributing traffic from various locations across all targets in all regions.

Example `kubernetes.tf` configuration:
```hcl
ingress_load_balancer_pools = [
{
name = "lb-nbg"
location = "nbg1"
type = "lb11"
},
{
name = "lb-fsn"
location = "fsn1"
type = "lb11"
}
]
```

##### Local Traffic Optimization
Configuring local traffic handling enhances network efficiency by reducing latency. Processing traffic closer to its source eliminates unnecessary routing delays, ensuring consistent performance for low-latency or region-sensitive applications.

Example `kubernetes.tf` configuration:
```hcl
ingress_nginx_kind = "DaemonSet"
ingress_nginx_service_external_traffic_policy = "Local"

ingress_load_balancer_pools = [
{
name = "regional-lb-nbg"
location = "nbg1"
local_traffic = true
},
{
name = "regional-lb-fsn"
location = "fsn1"
local_traffic = true
}
]
```

Key settings in this configuration:
- `local_traffic`: Limits load balancer targets to nodes in the same geographic location as the load balancer, reducing data travel distances and keeping traffic within the region.
- `ingress_nginx_service_external_traffic_policy` set to `Local`: Ensures external traffic is handled directly on the local node, avoiding extra network hops.
- `ingress_nginx_kind` set to `DaemonSet`: Deploys an ingress controller instance on every node, enabling requests to be handled locally for faster response times.

Topology-aware routing in ingress-nginx can optionally be enabled by setting the `ingress_nginx_topology_aware_routing` variable to `true`. This functionality routes traffic to the nearest upstream endpoints, enhancing efficiency for supported services. Note that this feature is only applicable to services that support topology-aware routing. For more information, refer to the [Kubernetes documentation](https://kubernetes.io/docs/concepts/services-networking/topology-aware-routing/).

Network Segmentation

By default, this module calculates optimal subnets based on the provided network CIDR (`network_ipv4_cidr`). The network is segmented automatically as follows:

- **1st Quarter**: Reserved for other uses such as classic VMs.
- **2nd Quarter**:
- **1st Half**: Allocated for Node Subnets (`network_node_ipv4_cidr`)
- **2nd Half**: Allocated for Service IPs (`network_service_ipv4_cidr`)
- **3rd and 4th Quarters**:
- **Full Span**: Allocated for Pod Subnets (`network_pod_ipv4_cidr`)

Each Kubernetes node requires a `/24` subnet within `network_pod_ipv4_cidr`. To support this configuration, the optimal node subnet size (`network_node_ipv4_subnet_mask_size`) is calculated using the formula:

32 - (24 - subnet_mask_size(`network_pod_ipv4_cidr`)).

With the default `10.0.0.0/16` network CIDR (`network_ipv4_cidr`), the following values are calculated:
- **Node Subnet Size**: `/25` (Max. 128 Nodes per Subnet)
- **Node Subnets**: `10.0.64.0/19` (Max. 64 Subnets, each with `/25`)
- **Service IPs**: `10.0.96.0/19` (Max. 8192 Services)
- **Pod Subnet Size**: `/24` (Max. 256 Pods per Node)
- **Pod Subnets**: `10.0.128.0/17` (Max. 128 Nodes, each with `/24`)

Please consider the following Hetzner Cloud limits:
- Up to **100 servers** can be attached to a network.
- Up to **100 routes** can be created per network.
- Up to **50 subnets** can be created per network.
- A project can have up to **50 placement groups**.

A `/16` Network CIDR is sufficient to fully utilize Hetzner Cloud's scaling capabilities. It supports:
- Up to 100 nodes, each with its own `/24` Pod subnet route.
- Configuration of up to 50 nodepools, one nodepool per subnet, each with at least one placement group.

Here is a table with more example calculations:
| Network CIDR | Node Subnet Size | Node Subnets | Service IPs | Pod Subnets |
| --------------- | ---------------- | ----------------- | ------------------- | ------------------- |
| **10.0.0.0/16** | /25 (128 IPs) | 10.0.64.0/19 (64) | 10.0.96.0/19 (8192) | 10.0.128.0/17 (128) |
| **10.0.0.0/17** | /26 (64 IPs) | 10.0.32.0/20 (64) | 10.0.48.0/20 (4096) | 10.0.64.0/18 (64) |
| **10.0.0.0/18** | /27 (32 IPs) | 10.0.16.0/21 (64) | 10.0.24.0/21 (2048) | 10.0.32.0/19 (32) |
| **10.0.0.0/19** | /28 (16 IPs) | 10.0.8.0/22 (64) | 10.0.12.0/22 (1024) | 10.0.16.0/20 (16) |
| **10.0.0.0/20** | /29 (8 IPs) | 10.0.4.0/23 (64) | 10.0.6.0/23 (512) | 10.0.8.0/21 (8) |
| **10.0.0.0/21** | /30 (4 IPs) | 10.0.2.0/24 (64) | 10.0.3.0/24 (256) | 10.0.4.0/22 (4) |

Storage Configuration

#### Hetzner Cloud CSI

The Hetzner Cloud Container Storage Interface (CSI) driver can be flexibly configured through the `hcloud_csi_storage_classes` variable. You can define multiple storage classes for your cluster:

* **name:** The name of the StorageClass (string, required).
* **encrypted:** Enable LUKS encryption for volumes (bool, required).
* **defaultStorageClass:** Set this class as the default (optional, bool, defaults to `false`).
* **reclaimPolicy:** The Kubernetes reclaim policy (`Delete` or `Retain`, optional, defaults to `Delete`).
* **extraParameters:** Additional parameters for the StorageClass (optional map).

**Example:**

```hcl
hcloud_csi_storage_classes = [
{
name = "hcloud-volumes"
encrypted = false
defaultStorageClass = true
},
{
name = "hcloud-volumes-encrypted-xfs"
encrypted = true
reclaimPolicy = "Retain"
extraParameters = {
"csi.storage.k8s.io/fstype" = "xfs"
"fsFormatOption" = "-i nrext64=1"
}
}
]
```

**Other settings:**

* **hcloud\_csi\_encryption\_passphrase:**
Optionally provide a custom encryption passphrase for LUKS-encrypted storage classes.

```hcl
hcloud_csi_encryption_passphrase = ""
```

**Storage Class Immutability:**
StorageClasses created by the Hcloud CSI driver are immutable. To change parameters after creation, you must either edit the StorageClass directly with `kubectl`, or delete it from both Terraform state and Kubernetes, then let this module recreate it.

For more details, see the [HCloud CSI Driver documentation](https://github.com/hetznercloud/csi-driver/tree/main/docs/kubernetes).

#### Longhorn

Longhorn is a lightweight, reliable, and easy-to-use distributed block storage system for Kubernetes.
It is fully independent from the Hetzner Cloud CSI driver.

You can enable Longhorn and configure it as the default StorageClass for your cluster via module variables:

* **Enable Longhorn:**
Set `longhorn_enabled` to `true` to deploy Longhorn in your cluster.

* **Default StorageClass:**
Set `longhorn_default_storage_class` to `true` if you want Longhorn to be the default StorageClass.

**Example:**

```hcl
longhorn_enabled = true
longhorn_default_storage_class = true
```

For more information about Longhorn, see the [Longhorn documentation](https://longhorn.io/docs/).

Talos Backup

This module natively supports Hcloud Object Storage. Below is an example of how to configure backups with [MinIO Client](https://github.com/minio/mc?tab=readme-ov-file#homebrew) (`mc`) and Hcloud Object Storage. While it's possible to create the bucket through the [Hcloud Console](https://console.hetzner.cloud), this method does not allow for the configuration of automatic retention policies.

Create an alias for the endpoint using the following command:
```sh
mc alias set \
https://.your-objectstorage.com \
\
--api "s3v4" \
--path "off"
```

Create a bucket with automatic retention policies to protect your backups:
```sh
mc mb --with-lock --region /
mc retention set GOVERNANCE 14d --default /
```

Configure your `kubernetes.tf` file:
```hcl
talos_backup_s3_hcloud_url = "https://..your-objectstorage.com"
talos_backup_s3_access_key = ""
talos_backup_s3_secret_key = ""

# Optional: AGE X25519 Public Key for encryption
talos_backup_age_x25519_public_key = ""

# Optional: Change schedule (cron syntax)
talos_backup_schedule = "0 * * * *"
```

For users of other object storage providers, configure `kubernetes.tf` as follows:
```hcl
talos_backup_s3_region = ""
talos_backup_s3_endpoint = ""
talos_backup_s3_bucket = ""
talos_backup_s3_prefix = ""

# Use path-style URLs (set true if required by your provider)
talos_backup_s3_path_style = true

# Access credentials
talos_backup_s3_access_key = ""
talos_backup_s3_secret_key = ""

# Optional: AGE X25519 Public Key for encryption
talos_backup_age_x25519_public_key = ""

# Optional: Change schedule (cron syntax)
talos_backup_schedule = "0 * * * *"
```

To recover from a snapshot, please refer to the Talos Disaster Recovery section in the [Documentation](https://www.talos.dev/latest/advanced/disaster-recovery/#recovery).

Talos Bootstrap Manifests

### Component Deployment Control

During cluster provisioning, each component manifest is applied using Talos’s bootstrap manifests feature. Components are upgraded as part of the normal lifecycle of this module.
You can enable or disable component deployment using the variables below:

```hcl
# Core Components (enabled by default)
cilium_enabled = true
talos_backup_s3_enabled = true
talos_ccm_enabled = true
talos_coredns_enabled = true
hcloud_ccm_enabled = true
hcloud_csi_enabled = true
metrics_server_enabled = true
prometheus_operator_crds_enabled = true

# Additional Components (disabled by default)
cert_manager_enabled = true
cert_manager_webhook_hetzner_enabled = true
ingress_nginx_enabled = true
longhorn_enabled = true

# Enable etcd backup by defining one of these variables:
talos_backup_s3_endpoint = "https://..."
talos_backup_s3_hcloud_url = "https://..your-objectstorage.com"

# Cluster Autoscaler: Enabled when node pools are defined
cluster_autoscaler_nodepools = [
{
name = "autoscaler"
type = "cpx22"
location = "nbg1"
min = 0
max = 6
labels = {
"autoscaler-node" = "true"
}
taints = [
"autoscaler-node=true:NoExecute"
]
}
]
```

> **Note:** Disabling a component **does not delete** its existing resources.
> This is documented in the [Talos documentation](https://www.talos.dev/latest/kubernetes-guides/upgrading-kubernetes/#automated-kubernetes-upgrade).
> You must remove deployed resources manually after disabling a component in the manifests.

---

### Adding Additional Manifests

Besides the default components, you can add extra bootstrap manifests as follows:

```hcl
# Extra remote manifests (URLs fetched at apply time)
talos_extra_remote_manifests = [
"https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/standard-install.yaml"
]

# Extra inline manifests (defined directly)
talos_extra_inline_manifests = [
{
name = "test-manifest"
contents = <<-EOF
---
apiVersion: v1
kind: Secret
metadata:
name: test-secret
data:
secret: dGVzdA==
EOF
}
]
```

Talos Discovery Service

Talos supports two node discovery mechanisms:

- **Discovery Service Registry** (default): A public, external registry operated by Sidero Labs that works even when Kubernetes is unavailable. Nodes must have outbound access to TCP port 443 to communicate with it.
- **Kubernetes Registry**: Relies on Kubernetes Node metadata stored in etcd.

This module uses the discovery service to perform additional health checks during Talos upgrades, Kubernetes upgrades, and Kubernetes manifest synchronization. If no discovery mechanism is enabled, these additional checks will be skipped.

> ⚠️ **Important:** Kubernetes-based discovery is **incompatible by default** with Kubernetes **v1.32+** due to the `AuthorizeNodeWithSelectors` feature gate, which restricts access to Node metadata. This can cause broken discovery behavior, such as failing or incomplete results from `talosctl health` or `talosctl get members`.

##### Example Configuration

```hcl
# Disable Kubernetes-based discovery (deprecated in Kubernetes >= 1.32)
talos_kubernetes_discovery_service_enabled = false

# Enable the external Sidero Labs discovery service (default)
talos_siderolabs_discovery_service_enabled = true
```

For more details, refer to the [official Talos discovery guide](https://www.talos.dev/latest/talos-guides/discovery/).

Kubernetes RBAC

This module allows you to create custom Kubernetes RBAC (Role-Based Access Control) roles and cluster roles that define specific permissions for users and groups. RBAC controls what actions users can perform on which Kubernetes resources.
These custom roles can be used independently or combined with OIDC group mappings to automatically assign permissions based on user group membership from your identity provider.

#### Example Configuration

##### Cluster Roles (`rbac_cluster_roles`)

```hcl
rbac_cluster_roles = [
{
name = my-cluster-role # ClusterRole name
rules = [
{
api_groups = [""] # Core API group (empty string for core resources)
resources = ["nodes"] # Cluster-wide resources this role can access
verbs = ["get", "list", "watch"] # Actions allowed on these resources
}
]
}
]
```

##### Namespaced Roles (`rbac_roles`)

```hcl
rbac_roles = [
{
name = "my-role" # Role name
namespace = "target-namespace" # Namespace where the role will be created
rules = [
{
api_groups = [""] # Core API group (empty string for core resources)
resources = ["pods", "services"] # Resources this role can access
verbs = ["get", "list", "watch"] # Actions allowed on these resources
}
]
}
]
```

OIDC Cluster Authentication

The Kubernetes API server supports OIDC (OpenID Connect) authentication, allowing integration with external identity providers like Keycloak, Auth0, Authentik, Zitadel, etc.
When enabled, users can authenticate using their existing organizational credentials instead of managing separate Kubernetes certificates or tokens.

OIDC authentication works by validating JWT tokens issued by your identity provider, extracting user information and group memberships, and mapping them to Kubernetes RBAC roles.

#### Example Configuration

```hcl
# OIDC Configuration
oidc_enabled = true # Enable OIDC authentication
oidc_issuer_url = "https://your-oidc-provider.com" # Your OIDC provider issuer URL
oidc_client_id = "your-client-id" # Client ID registered in your OIDC provider
oidc_username_claim = "preferred_username" # OIDC JWT claim to extract username from
oidc_groups_claim = "groups" # OIDC JWT claim to extract user groups from
oidc_groups_prefix = "oidc:" # Prefix added to group names in K8s to avoid conflicts

# Map OIDC groups to Kubernetes roles and cluster roles
oidc_group_mappings = [ # List of OIDC group mappings
{
group = "cluster-admins-group" # OIDC provider group name
cluster_roles = ["cluster-admin"] # Grant cluster-admin access
},
{
group = "developers-group" # OIDC provider group name
cluster_roles = ["view"] # Grant cluster-wide view access
roles = [ # Grant namespace scoped roles
{
name = "developer-role" # Custom role name
namespace = "development" # Namespace where role applies
}
]
}
]
```

#### Client Configuration with kubelogin

Once OIDC is configured in your cluster, you'll need to configure your local kubectl to authenticate using OIDC tokens. This requires the [kubelogin](https://github.com/int128/kubelogin) plugin.

##### Install kubelogin

```bash
# Homebrew (macOS and Linux)
brew install kubelogin

# Krew (macOS, Linux, Windows and ARM)
kubectl krew install oidc-login

# Chocolatey (Windows)
choco install kubelogin
```

#### Test OIDC Authentication

First, verify that your OIDC provider is returning proper JWT tokens. Replace the placeholder values with your actual OIDC configuration:

```bash
kubectl oidc-login setup \
--oidc-issuer-url=https://your-oidc-provider.com \
--oidc-client-id=your-client-id \
--oidc-client-secret=your-client-secret \
--oidc-extra-scope=openid,email,profile # Add or change the scopes according to your IDP
```

This will open your browser for authentication. After successful login, you should see a JWT token in your terminal that looks like:

```json
{
"aud": "your-client-id",
"email": "user@example.com",
"email_verified": true,
"exp": 1749867571,
"groups": [
"developers",
"kubernetes-users"
],
"iat": 1749863971,
"iss": "https://your-oidc-provider.com",
"nonce": "random-nonce-string",
"sub": "user-unique-identifier"
}
```

Verify that:

- The `groups` array contains your expected groups
- The `email` field matches your user email
- `email_verified` is `true` (required by K8s)

#### Configure kubectl

Add a new user to your `~/.kube/config` file:

```yaml
users:
- name: oidc-user
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
command: kubectl
args:
- oidc-login
- get-token
- --oidc-issuer-url=https://your-oidc-provider.com
- --oidc-client-id=your-client-id
- --oidc-client-secret=your-client-secret
- --oidc-extra-scope=groups
- --oidc-extra-scope=email
- --oidc-extra-scope=name # Add or change the scopes according to your IDP
```

Update your context to use the new OIDC user:

```yaml
contexts:
- context:
cluster: your-cluster
namespace: default
user: oidc-user # Changed from certificate-based user
name: oidc@your-cluster # Updated context name
```

Now you can switch to the OIDC context and authenticate using your identity provider:

```bash
kubectl config use-context your-cluster-oidc
kubectl get pods # This will trigger OIDC authentication
```

Trusted CA Certificates

Talos allows adding additional trusted root CA certificates to the system. This is useful for private image registries, OIDC providers with internal CAs, or other enterprise services.

The `talos_certificates` variable supports several input formats for maximum flexibility:

#### Single Certificate (Inline)

```hcl
talos_certificates = {
"my-custom-ca" = "-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----"
}
```

#### Multiple Certificates (List)

```hcl
talos_certificates = {
"enterprise-ca" = [
file("root.crt"),
file("intermediate.crt")
]
}
```

#### Multi-Cert PEM Block (String)

```hcl
talos_certificates = {
"legacy-ca" = <<-EOT
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
EOT
}
```

## ♻️ Lifecycle
This module manages the versions of Talos, Kubernetes, and all bundled [components](#-components) together. Each Hcloud K8s major version targets a defined Talos and Kubernetes minor version, while component versions are selected for compatibility with that Kubernetes release.

### πŸ“Œ Hcloud Kubernetes Versions
The table below lists the Talos and Kubernetes versions used by each Hcloud K8s version.

| Hcloud K8s | Talos Linux | Kubernetes |
| :--------: | :---------: | :--------: |
| **(7)** | (1.15) | 1.36 |
| **(6)** | (1.14) | 1.35 |
| **(5)** | 1.13 | 1.34 |
| **4** | 1.12 | 1.33 |

Parenthesized versions are planned targets.

### β˜‘οΈ Kubernetes Compatibility Matrix
The table below lists the **minimum required versions** of each component to support the specified Kubernetes release.

| Kubernetes | Hcloud CCM | Hcloud CSI | Longhorn | Cilium | Ingress NGINX | Cert Manager |
| :--------: | :--------: | :--------: | :------: | :------: | :-----------: | :----------: |
| **1.36** | β‰₯ 1.31 | β‰₯ 2.21 | ? | ? | - | (β‰₯ 1.21) |
| **1.35** | β‰₯ 1.30 | β‰₯ 2.19 | β‰₯ 1.11 | β‰₯ 1.19.2 | β‰₯ 4.15 | β‰₯ 1.19 |
| **1.34** | β‰₯ 1.27 | β‰₯ 2.18 | β‰₯ 1.11 | β‰₯ 1.19 | β‰₯ 4.14 | β‰₯ 1.19 |
| **1.33** | β‰₯ 1.26 | β‰₯ 2.14 | β‰₯ 1.8.2 | β‰₯ 1.18 | β‰₯ 4.13 | β‰₯ 1.18 |

### ⬆️ Upgrade Policy
Any minor or major upgrade to **Talos** or **Kubernetes** results in a major version change for this module. Downgrades are generally neither supported nor tested.

Changing software versions manually is not recommended. Component versions are selected and tested for compatibility with the Kubernetes release shown above.

> [!IMPORTANT]
> - Do not combine major or minor module upgrades with infrastructure changes.
> - Upgrade to the latest minor release before upgrading to the next major version.
> - Verify that the cluster is healthy before and after upgrades.

### 🧭 Roadmap
* [ ] **Ingress NGINX [Retirement in March 2026](https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/)**

* [x] Add general support for Gateway API
* [x] Integrate Cilium Gateway API
* [x] Deprecate Ingress NGINX in v4
* [ ] Remove Ingress NGINX in v6
* [ ] **Support for Hetzner [Dedicated Bare Metal Servers](https://www.hetzner.com/de/dedicated-rootserver/)**

## ❀️ Support this Project
If you'd like to support this project, please consider leaving a ⭐ on GitHub!


Stars



Star History Chart

> [!TIP]
> If you don’t have a Hetzner account yet, you can use this [Hetzner Cloud Referral Link](https://hetzner.cloud/?ref=GMylKeDmqtsD) to claim a €20 credit and support this project at the same time.

### πŸ’– Special Thanks to All Sponsors! πŸ’–


mammouth.aiΒ Β 



Jona KoudijsΒ Β devXYΒ Β 

Your sponsorship supports the ongoing development, improvement, and maintenance of this project πŸ™

**Become a Sponsor:**
-

GitHub Sponsors


-

Liberapay Patrons

## πŸ‘‹ Community
We welcome everyone to join the [Discussions](https://github.com/hcloud-k8s/terraform-hcloud-kubernetes/discussions), report [Issues](https://github.com/hcloud-k8s/terraform-hcloud-kubernetes/issues/new/choose), and help improve this project.

### 🀝 Contributing



Contributions are always welcome!

## πŸ“Ž Project Info
Hcloud Kubernetes is an independent, [open source](https://opensource.org/osd) project built for the Hetzner Community. Its goal is to make Kubernetes deployments as predictable, repeatable, and flexible as possible.


Supported by Hetzner


This project is supported by Hetzner through testing infrastructure ❀️


### βš–οΈ License

Distributed under the MIT License. See [LICENSE](https://github.com/hcloud-k8s/terraform-hcloud-kubernetes/blob/main/LICENSE) for more information.

### πŸ’Ž Acknowledgements

- [Talos Linux](https://www.talos.dev) for its impressively secure, immutable, and minimalistic Kubernetes distribution.
- [Hetzner Cloud](https://www.hetzner.com/cloud) for offering excellent cloud infrastructure with robust Kubernetes integrations.