An open API service indexing awesome lists of open source software.

https://github.com/brutus5000/k8s-config

FAForever Kubernetes stack
https://github.com/brutus5000/k8s-config

faforever infrastructure kubernetes

Last synced: about 1 year ago
JSON representation

FAForever Kubernetes stack

Awesome Lists containing this project

README

          

= FAF K8S config

This repository implements the link:ARCHITECTURE.md[FAF architecture] in Kubernetes. It's intended to supersede the
current https://github.com/FAForever/faf-stack[FAF Docker-Compose stack].

== Requirements & tools

* https://k3s.io[k3s] or https://k3d.io/[k3d] installed
** The main node must run with some arguments: +
`--node-label=storage-id=main-01` (determines the main storage node)
* https://stedolan.github.io/jq/[jq] installed (required for scripts)
* *Recommended:* A k8s ui such as https://k8slens.dev/[Lens] (GUI) or https://k9scli.io/[k9s] (CLI)

== Motivation

=== Immediate goals

* Allow more people access to logs, configuration and deployments for certain services without giving them full server
access.
** Role based access control
** Direct cluster access via kubectl for all authorized developers
** Improved configuration and secret management
* Advanced resource controls thanks to cpu and ram limits (no app shall consume all CPU ever again)
* Easier debugging on test environment due to port-forwarding of pods without compromising production config

=== Long-term goals

* Better integration into CI pipelines with automated deployments

=== Non-Goals
* We do not move to k8s because it is cool and fancy!
** K8s has much more complexity compared to our docker-compose stack
** We would avoid it, if docker-compose could solve our problems (which it can't).
** It was a very conscious trade-off decision.
* We do not move to k8s because we want to deploy FAF on a managed cloud provider.
** Cloud providers are super expensive. We'd have nothing to gain here.
* We do not move to k8s become highly available!
** High availability only works if all components are highly available. Most of our apps are not built in that way at
all.
** Deployments with less downtime might be a benefit for _some_ services.

== Decision Log

=== Distribution selection

* We'll use k3s.
** It is fully supported by NixOS and is a simplified distribution which should be easier to maintain.
** It also runs on developer machines.
** It uses few resources.
* Running the same distribution on prod and on local machines makes things more predictable and scripts more stable.
** Minikube should be mostly compatible, if some devs insist on using it

=== Volume management

* We'll run with manual managed persistent volumes and claims, because we need predictable paths.
* Predictable paths are a necessity for managing the volumes with ZFS. +
* Using k3s local-path-provisioner we can define the prefix (in the configmap `local-path-config`) and the suffix
(in the mounting options in the pod), but in between these there is a random uuid we can't know beforehand. +
This breaks predefined setups and scripts.
* K8s builtin local-path with node affinity ensures all data of a volume can be stored on a selected node (the node label with storage-id=main-01)

=== Traefik IngressRoute over default Ingress definitions

* K3s comes with Traefik as Ingress controller by default.
* The default Ingress controller in the outside world is nginx.
* Traefik is well known to FAF since we use it as revers proxy in our faf-stack extensively
* Traefik offers support for
** classic Ingress definitions, but requires ingress annotations to use more advanced features (similar to Traefik labels in our current docker-compose.yml)
** custom IngressRoute definitions which maps the exact Traefik feature set into a yaml format (no annotations required)

We have to select which resource type we use and we should stick to it consistently. As always it's a tradeoff:

* Pro classic Ingress
** Class ingress is stable by (not so long) now, while Traefik IngressRoutes are still marked as alpha (yet we use Traefik for quite a while and there were rarely changes even from 1.x to 2.x)
** Classic Ingress is well-known syntax and understood by most external K8s users. So the entry barrier for external contributions is lower. However a lot of functionality would hide behind the Traefik annotations which would still need people to learn it to understand it all.
** Using classic ingress would allow us to swap out Traefik anytime and still have a mostly working setup
* Pro Traefik IngressRoute
** We (the FAF responsible Ops guys) see Traefik as superior compared to Nginx (and moved from Nginx to Traefik as reversy proxy years ago)
*** Thus we do not expect moving back
** We have an existing stack we need to migrate 1:1
** Since we use Traefik features anyway using the IngressRoute reduces the overall yaml complexity as we do not split logic and annotations
** Traefik syntax seems easier to understand than regular Ingress, so using Traefik syntax might lower the barrier for external contributors who never used classic Ingress.

**Decision:** We'll use Traefik IngressRoutes.

=== Certificate management & Let's encrypt

* We could run for Traefik certificate resolvers or use cert-manager
* Cert-Manager works with classic Ingress routes and Traefik specific IngressRoutes
** Needs additional software
** Has a short support cycle (6 months per point release)
** => More maintenance overhead
* Traefik internal let's encrypt resolver needs to be manually configured on the node
** It stores certificates somewhere on disk
** The easiest approach is a persistent volume on the main storage node
*** This effectively restricts Traefik to run on a single node
** More sophisticate approach is storing the certificates in a persistent remote / network volume
** Once we have full Cloudflare access, we can do Cloudflare DNS challenge using a Cloudflare token. Then Traefik does not need to issue one certificate per subdomain. It's unclear though if this makes persisting the certificate obsolete.

**Decision:** We'll use Traefik as long as we don't run into any problems, since it seems less maintenance buurden. Cert-manager can still be introduced later if required.

=== RabbitMQ

For RabbitMQ there are 3 potential ways of implementing:

* Manually define a single-node statefulset as a 1:1 copy of faf-stack.
* A Helm chart from Bitnami
* Deploying the RabbitMQ operator

**Decision:** We'll run for the Bitnami helm charts. It is really awesome configurable so that it can read _our_ secrets, so the template can be perfectly configure. This simplifies coding compared to a manual stefulset. The RabbitMQ operator seems much more complex for now.

=== User access and RBAC

* We want to give access to multiple people with potentially different permissions.
* Handing out service account certificates is quite annoying.
* An SSO login via OIDC is preferred and supported by K8s / K3s.
** The preferred identity provider would be Github as all developers are there and its outside the system itself. Unfortunately Gitlab only supports OAuth2 and not OIDC.
** Google accounts would be an alternative, but we don't want to force people on Google.
** We'll use FAFs custom login instead.
** As a fallback (in case the FAF login is broken) we still have the main service account.
* RBAC t.b.d.

=== Developer environment & reproducibility

- No service shall go live if its initial configuration or installation can't be scripted.
- Everything must be runnable on a single-node cluster.
- Scripts shall be idempotent / re-runnable without fatal consequences. We will use k8s annotations to keep track of the state.