{"id":18001035,"url":"https://github.com/brutus5000/k8s-config","last_synced_at":"2025-03-26T07:32:20.376Z","repository":{"id":115601421,"uuid":"432005203","full_name":"Brutus5000/k8s-config","owner":"Brutus5000","description":"FAForever Kubernetes stack","archived":false,"fork":false,"pushed_at":"2022-10-19T07:28:39.000Z","size":82,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-21T11:50:27.519Z","etag":null,"topics":["faforever","infrastructure","kubernetes"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Brutus5000.png","metadata":{"files":{"readme":"README.adoc","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-25T23:53:35.000Z","updated_at":"2022-10-11T14:59:05.000Z","dependencies_parsed_at":null,"dependency_job_id":"a880da9a-d77a-4f02-9570-216a1b9bed14","html_url":"https://github.com/Brutus5000/k8s-config","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Brutus5000%2Fk8s-config","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Brutus5000%2Fk8s-config/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Brutus5000%2Fk8s-config/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Brutus5000%2Fk8s-config/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Brutus5000","download_url":"https://codeload.github.com/Brutus5000/k8s-config/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245611983,"owners_count":20643940,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["faforever","infrastructure","kubernetes"],"created_at":"2024-10-29T23:15:53.442Z","updated_at":"2025-03-26T07:32:20.368Z","avatar_url":"https://github.com/Brutus5000.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"= FAF K8S config\n\nThis repository implements the link:ARCHITECTURE.md[FAF architecture] in Kubernetes. It's intended to supersede the\ncurrent https://github.com/FAForever/faf-stack[FAF Docker-Compose stack].\n\n== Requirements \u0026 tools\n\n* https://k3s.io[k3s] or https://k3d.io/[k3d] installed\n** The main node must run with some arguments: +\n`--node-label=storage-id=main-01` (determines the main storage node)\n* https://stedolan.github.io/jq/[jq] installed (required for scripts)\n* *Recommended:* A k8s ui such as https://k8slens.dev/[Lens] (GUI) or https://k9scli.io/[k9s] (CLI)\n\n== Motivation\n\n=== Immediate goals\n\n* Allow more people access to logs, configuration and deployments for certain services without giving them full server\naccess.\n** Role based access control\n** Direct cluster access via kubectl for all authorized developers\n** Improved configuration and secret management\n* Advanced resource controls thanks to cpu and ram limits (no app shall consume all CPU ever again)\n* Easier debugging on test environment due to port-forwarding of pods without compromising production config\n\n=== Long-term goals\n\n* Better integration into CI pipelines with automated deployments\n\n=== Non-Goals\n* We do not move to k8s because it is cool and fancy!\n** K8s has much more complexity compared to our docker-compose stack\n** We would avoid it, if docker-compose could solve our problems (which it can't).\n** It was a very conscious trade-off decision.\n* We do not move to k8s because we want to deploy FAF on a managed cloud provider.\n** Cloud providers are super expensive. We'd have nothing to gain here.\n* We do not move to k8s become highly available!\n** High availability only works if all components are highly available. Most of our apps are not built in that way at\n   all.\n** Deployments with less downtime might be a benefit for _some_ services.\n\n== Decision Log\n\n=== Distribution selection\n\n* We'll use k3s.\n** It is fully supported by NixOS and is a simplified distribution which should be easier to maintain.\n** It also runs on developer machines.\n** It uses few resources.\n* Running the same distribution on prod and on local machines makes things more predictable and scripts more stable.\n** Minikube should be mostly compatible, if some devs insist on using it\n\n\n=== Volume management\n\n* We'll run with manual managed persistent volumes and claims, because we need predictable paths.\n* Predictable paths are a necessity for managing the volumes with ZFS. +\n* Using k3s local-path-provisioner we can define the prefix (in the configmap `local-path-config`) and the suffix\n  (in the mounting options in the pod), but in between these there is a random uuid we can't know beforehand. +\nThis breaks predefined setups and scripts.\n* K8s builtin local-path with node affinity ensures all data of a volume can be stored on a selected node (the node label with storage-id=main-01)\n\n=== Traefik IngressRoute over default Ingress definitions\n\n* K3s comes with Traefik as Ingress controller by default.\n* The default Ingress controller in the outside world is nginx.\n* Traefik is well known to FAF since we use it as revers proxy in our faf-stack extensively\n* Traefik offers support for\n** classic Ingress definitions, but requires ingress annotations to use more advanced features (similar to Traefik labels in our current docker-compose.yml)\n** custom IngressRoute definitions which maps the exact Traefik feature set into a yaml format (no annotations required)\n\nWe have to select which resource type we use and we should stick to it consistently. As always it's a tradeoff:\n\n* Pro classic Ingress\n** Class ingress is stable by (not so long) now, while Traefik IngressRoutes are still marked as alpha (yet we use Traefik for quite a while and there were rarely changes even from 1.x to 2.x)\n** Classic Ingress is well-known syntax and understood by most external K8s users. So the entry barrier for external contributions is lower. However a lot of functionality would hide behind the Traefik annotations which would still need people to learn it to understand it all.\n** Using classic ingress would allow us to swap out Traefik anytime and still have a mostly working setup\n* Pro Traefik IngressRoute\n** We (the FAF responsible Ops guys) see Traefik as superior compared to Nginx (and moved from Nginx to Traefik as reversy proxy years ago)\n*** Thus we do not expect moving back\n** We have an existing stack we need to migrate 1:1\n** Since we use Traefik features anyway using the IngressRoute reduces the overall yaml complexity as we do not split logic and annotations\n** Traefik syntax seems easier to understand than regular Ingress, so using Traefik syntax might lower the barrier for external contributors who never used classic Ingress.\n\n**Decision:** We'll use Traefik IngressRoutes.\n\n\n=== Certificate management \u0026 Let's encrypt\n\n* We could run for Traefik certificate resolvers or use cert-manager\n* Cert-Manager works with classic Ingress routes and Traefik specific IngressRoutes\n** Needs additional software\n** Has a short support cycle (6 months per point release)\n** =\u003e More maintenance overhead\n* Traefik internal let's encrypt resolver needs to be manually configured on the node\n** It stores certificates somewhere on disk\n** The easiest approach is a persistent volume on the main storage node\n*** This effectively restricts Traefik to run on a single node\n** More sophisticate approach is storing the certificates in a persistent remote / network volume\n** Once we have full Cloudflare access, we can do Cloudflare DNS challenge using a Cloudflare token. Then Traefik does not need to issue one certificate per subdomain. It's unclear though if this makes persisting the certificate obsolete.\n\n**Decision:** We'll use Traefik as long as we don't run into any problems, since it seems less maintenance buurden. Cert-manager can still be introduced later if required.\n\n\n=== RabbitMQ\n\nFor RabbitMQ there are 3 potential ways of implementing:\n\n* Manually define a single-node statefulset as a 1:1 copy of faf-stack.\n* A Helm chart from Bitnami\n* Deploying the RabbitMQ operator\n\n**Decision:** We'll run for the Bitnami helm charts. It is really awesome configurable so that it can read _our_ secrets, so the template can be perfectly configure. This simplifies coding compared to a manual stefulset. The RabbitMQ operator seems much more complex for now.\n\n\n=== User access and RBAC\n\n* We want to give access to multiple people with potentially different permissions.\n* Handing out service account certificates is quite annoying.\n* An SSO login via OIDC is preferred and supported by K8s / K3s.\n** The preferred identity provider would be Github as all developers are there and its outside the system itself. Unfortunately Gitlab  only supports OAuth2 and not OIDC.\n** Google accounts would be an alternative, but we don't want to force people on Google.\n** We'll use FAFs custom login instead.\n** As a fallback (in case the FAF login is broken) we still have the main service account.\n* RBAC t.b.d.\n\n=== Developer environment \u0026 reproducibility\n\n- No service shall go live if its initial configuration or installation can't be scripted.\n- Everything must be runnable on a single-node cluster.\n- Scripts shall be idempotent / re-runnable without fatal consequences. We will use k8s annotations to keep track of the state.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrutus5000%2Fk8s-config","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrutus5000%2Fk8s-config","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrutus5000%2Fk8s-config/lists"}