{"id":26088775,"url":"https://github.com/gardener/kupid","last_synced_at":"2026-01-16T06:54:44.571Z","repository":{"id":37979840,"uuid":"267070183","full_name":"gardener/kupid","owner":"gardener","description":"Inject scheduling criteria into target pods orthogonally by policy definition.","archived":false,"fork":false,"pushed_at":"2025-04-08T10:30:40.000Z","size":17788,"stargazers_count":12,"open_issues_count":2,"forks_count":21,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-04-08T11:25:59.615Z","etag":null,"topics":["affinity","golang","kubernetes","kubernetes-controller","kubernetes-operator","nodeaffinity","pod-scheduling-criteria","scheduler","scheduling","scheduling-algorithms","scheduling-policies","tainting","tolerations","webhook"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gardener.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-26T14:45:33.000Z","updated_at":"2025-04-08T10:30:41.000Z","dependencies_parsed_at":"2025-02-20T10:26:31.308Z","dependency_job_id":"407a933e-64c4-4cde-b427-f1dc0a42903d","html_url":"https://github.com/gardener/kupid","commit_stats":null,"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gardener%2Fkupid","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gardener%2Fkupid/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gardener%2Fkupid/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gardener%2Fkupid/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gardener","download_url":"https://codeload.github.com/gardener/kupid/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248514730,"owners_count":21117017,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["affinity","golang","kubernetes","kubernetes-controller","kubernetes-operator","nodeaffinity","pod-scheduling-criteria","scheduler","scheduling","scheduling-algorithms","scheduling-policies","tainting","tolerations","webhook"],"created_at":"2025-03-09T08:13:28.428Z","updated_at":"2026-01-16T06:54:44.556Z","avatar_url":"https://github.com/gardener.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# kupid\n\n[![REUSE status](https://api.reuse.software/badge/github.com/gardener/kupid)](https://api.reuse.software/info/github.com/gardener/kupid)\n\nInject scheduling criteria into target pods orthogonally by policy definition.\n\n## Content\n\n- [kupid](#kupid)\n  - [Content](#content)\n  - [Goals](#goals)\n  - [Non-goals](#non-goals)\n  - [Development Installation](#development-installation)\n    - [Building the docker image](#building-the-docker-image)\n    - [Deploying kupid](#deploying-kupid)\n      - [Pre-requisites](#pre-requisites)\n      - [Using self-generated certificates](#using-self-generated-certificates)\n      - [Using cert-manager](#using-cert-manager)\n  - [Context](#context)\n    - [affinity](#affinity)\n      - [nodeAffinity](#nodeaffinity)\n      - [podAffinity](#podaffinity)\n      - [podAntiAffinity](#podantiaffinity)\n    - [nodeName](#nodename)\n    - [nodeSelector](#nodeselector)\n    - [schedulerName](#schedulername)\n    - [tolerations](#tolerations)\n  - [Problem](#problem)\n  - [Solution](#solution)\n    - [Sequence Diagram](#sequence-diagram)\n    - [PodSchedulingPolicy](#podschedulingpolicy)\n    - [ClusterPodSchedulingPolicy](#clusterpodschedulingpolicy)\n    - [Support for top-down pod scheduling criteria](#support-for-top-down-pod-scheduling-criteria)\n    - [Gardener Integration Sequence Diagram](#gardener-integration-sequence-diagram)\n    - [Pros](#pros)\n    - [Cons](#cons)\n    - [Mutating higher-order controllers](#mutating-higher-order-controllers)\n      - [Sequence Diagram](#sequence-diagram-1)\n      - [Gardener Integration Sequence Diagram](#gardener-integration-sequence-diagram-1)\n  - [Alternatives](#alternatives)\n    - [Propagate flexibility up the chain](#propagate-flexibility-up-the-chain)\n    - [Make assumptions](#make-assumptions)\n  - [Prior Art](#prior-art)\n    - [PodPreset](#podpreset)\n    - [Banzai Cloud Spot Config Webhook](#banzai-cloud-spot-config-webhook)\n    - [OPA Gatekeeper](#opa-gatekeeper)\n\n## Goals\n\n- Declare and manage many different forms of pod scheduling criteria for pods in a Kubernetes cluster. This includes `affinity` (for node, pod and anti-affinity), `nodeName`, `nodeSelector`, `schedulerName` and `tolerations`.\n- Dynamically inject the maintained relevant pod scheduling criteria to the pods during pod creation.\n- Allow pods to declare their own scheduling criteria which would override any declaratively maintained policy in case of conflict.\n- Allow some namespaces and/or pods to be selected (or not selected) as targets for scheduling policies based on the label selector mechanism.\n- Generally, make it possible to cleanly separate (and orthogonally enforce) the concerns of how and where the workload deployed on a Kubernetes cluster should be scheduled from the controllers/operators that manage them.\n- Enable Gardener to deploy such a mechanism to inject pod scheduling criteria orthogonally into seed cluster workloads by deploying `kupid` as a Gardener extension along with suitable scheduling policy instances.\nThis is especially relevant for supporting dedicated worker pools for shoot `etcd` workload in the seed clusters.\n\n## Non-goals\n\n- Prevent pods from declaring their own scheduling criteria.\n- Prevent Gardener from supporting seed clusters which do not have any dedicated worker pools or any form of pod scheduling criteria for seed cluster workload.\n\n## Development Installation\n\nThe steps for installing kupid on a Kubernetes cluster for development and/or trial are given below.\nThese are only development installation steps and not intended for any kind of production scenarios.\nFor anything other than development or trial purposes, please use your favorite CI/CD toolkit.\n\n### Building the docker image\n\nThe following steps explain how to build a docker image for kupid from the sources.\nIt is an optional step and can be skipped if the upstream docker image can be used.\n\n1. Build kupid locally. This step is optional if you are using upstream container image for kupid.\n\n```sh\nmake webhook\n```\n\n1. Build kupid container image. This step is optional if you are using upstream container image for kupid.\n\n```sh\nmake docker-build\n```\n\n1. Push the container image to the container repository. This step is optional if you are using upstream container image for kupid.\n\n```sh\nmake docker-push\n```\n\n### Deploying kupid\n\nPlease follow the following steps to deploy kupid resources on the target Kubernetes cluster.\n\n#### Pre-requisites\n\nThe development environment relies on [kustomize](https://github.com/kubernetes-sigs/kustomize).\nPlease [install](https://kubectl.docs.kubernetes.io/installation/kustomize/) it in your development environment.\n\n#### Using self-generated certificates\n\nKupid requires TLS certificates to be configures for its validating and mutating webhooks.\nKupid optionally supports generating the required TLS certificates and the default `ValidatingWebhookConfiguration` and `MutatingWebhookConfiguration` automatically.\n\nDeploy the resources based on [`config/default/kustomization.yaml`](config/default/kustomization.yaml) which can be further customized (if required) before executing this step.\n\n```sh\nmake deploy\n```\n\n#### Using cert-manager\n\nAlternatively, kupid can be deployed with externally generated TLS certificates and custom `ValidatingWebhookConfiguration` and `MutatingWebhookConfiguration`.\nBelow is an example of doing this using [cert-manager](https://github.com/jetstack/cert-manager).\nPlease make sure the target Kubernetes cluster you want to deploy kupid to has a working [installation](https://cert-manager.io/docs/installation/kubernetes/) of cert-manager.\n\nDeploy the resources based on [`config/using-certmanager/kustomization.yaml`](config/using-certmanager/kustomization.yaml) which can be further customized (if required) before executing this step.\n\n```sh\nmake deploy-using-certmanager\n```\n\n## Context\n\nKubernetes API provides many mechanisms for pods to influence how and where (which node) they get scheduling in/by the Kubernetes cluster.\nAll such mechanisms involve the pods declaring things in their [`PodSpec`](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core).\nAt present, there are five such mechanisms.\n\n### `affinity`\n\n[`Affinity`](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity) is one of the more sophisticated ways for a pod to influence where (which node) it gets scheduled.\n\nIt has three further sub-mechanisms.\n\n#### `nodeAffinity`\n\n[`NodeAffinity`](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity) is similar to but a more sophisticated way than the [`nodeSelector`](#nodeselector) to constrain the viable candidate subset of nodes in the cluster as a scheduling target for the pod.\nAn example of how it can be used can be seen [here](https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/pods/pod-with-node-affinity.yaml).\n\n#### `podAffinity`\n\n[`PodAffinity`](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#inter-pod-affinity-and-anti-affinity) is more subtle way to constrain the viable candidate subset of nodes in the cluster as a scheduling target for the pod.\nIn contrast to [`nodeAffinity`](#nodeaffinity), this is done not by directly identifying the viable candidate nodes by node label selector terms.\nInstead, it is done by selecting some other already scheduled pods that this pod should be collocated with.\nAn example of how it can be used can be seen [here](https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/pods/pod-with-pod-affinity.yaml).\n\n#### `podAntiAffinity`\n\n[`PodAAntiffinity`](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#inter-pod-affinity-and-anti-affinity) works in a way that is opposite of [`podAffinity`](#podaffinity).\nIt constrains the viable candidate nodes by selecting some other already scheduled pods that this pod should _not_ be collocated with.\nAn example of how it can be used can be seen [here](https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/pods/pod-with-pod-affinity.yaml).\n\n### `nodeName`\n\n[`NodeName`](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodename) is a very crude way that bypasses the whole pod scheduling mechanism by the pod itself declaring which node it wants to be scheduled on.\n\n### `nodeSelector`\n\n[`NodeSelector`](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector) is a simple way to constrain the viable candidate nodes for scheduling by specifying a label selector that select such viable nodes.\nAn example of how it can be used can be seen [here](https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/pods/pod-nginx.yaml).\n\n### `schedulerName`\n\nKubernetes supports [multiple schedulers](https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/) that can schedule workload in it.\nThe individual pods can declare which scheduler should scheduler them in the [`schedulerName`](https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/admin/sched/pod3.yaml).\nThe additional schedulers should be [separately deployed](https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/#define-a-kubernetes-deployment-for-the-scheduler), of course.\n\n### `tolerations`\n\nKubernetes supports the functionality of [`taints`](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/) which allow nodes to declaratively _repel_ pods from being scheduled on them.\nPods that want to get scheduled on such `taint`ed nodes need to declare [`tolerations`](https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/pods/pod-with-toleration.yaml) to such `taints`.\nTypically, this functionality is used in combination with other ways of _attracting_ these pods to get scheduled on such `taint`ed nodes, such as [`nodeAffinity`](#nodeaffinity), [`nodeSelector`](#nodeselector) etc.\n\n## Problem\n\n- All the mechanisms for influencing the scheduling of pods described [above](#context) have to be specified top-down (or in other words, vertically) by the pods themselves (or any higher order component/controller/operator that deploys them).\n- Such top-down approach forces all the components up the chain to be aware of the details of these mechanisms. I.e. they either [make some assumptions](#make-assumptions) at some stage about the pod scheduling criteria or expose the flexibility of specifying such pod scheduling criteria [all the way up the chain](#propagate-flexibility-up-the-chain).\n- Specifically, in the Gardener seed cluster, some workloads like `etcd` might be better off scheduled on dedicated worker pools so that other workloads and the common nodes on which they are scheduled can be scaled up and down by the [`Cluster Autoscaler`](https://github.com/gardener/autoscaler/tree/machine-controller-manager/cluster-autoscaler) more efficiently.\nThis approach might be used for other workloads too for other reasons in the future (pre-emptible nodes for controller workloads?).\n- However, Gardener must not force all seed clusters to always have dedicated worker pools.\nIt should be always possible to use Gardener with plain-vanilla seed clusters with no dedicated worker pools.\nThe support for dedicated worker pools should be _optional_.\n\n## Solution\n\nThe proposed solution is to declare the pod scheduling criteria as described [above](#context) in a [`CustomResourceDefinition`](https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/) and then inject the relevant specified pod scheduling criteria into pods orthogonally when they are created via a [mutating webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook).\n\n### Sequence Diagram\n\n![Sequence Diagram](docs/kupid-flow-pods.svg)\n\n### `PodSchedulingPolicy`\n\n[`PodSchedulingPolicy`](api/v1alpha1/podschedulingpolicy_types.go#L55-L60) is a namespaced CRD which describes, in its [`spec`](api/v1alpha1/podschedulingpolicy_types.go#L23-L50), all the pod scheduling criteria described [above](#context).\n\nThe criteria for selecting target pods on which the `PodSchedulingPolicy` is applied can be specified in the [`spec.podSelector`](api/v1alpha1/podschedulingpolicy_types.go#L29).\n\n### `ClusterPodSchedulingPolicy`\n\n[`ClusterPodSchedulingPolicy`](api/v1alpha1/clusterpodschedulingpolicy_types.go#L65-L70) is similar to the [`PodSchedulingPolicy`](#podschedulingpolicy), but it is a non-namespaced (cluster-scoped) CRD which describes, in its [`spec`](api/v1alpha1/clusterpodschedulingpolicy_types.go#L23-L59), all the pod scheduling criteria described [above](#context).\n\nThe criteria for selecting target pods on which the `ClusterPodSchedulingPolicy` is applied can be specified in the [`spec.podSelector`](api/v1alpha1/podschedulingpolicy_types.go#L29).\n\nIn addition, it allows specifying the target namespaces to which the `ClusterPodSchedulingPolicy` is applied via [`spec.namespaceSelector`](api/v1alpha1/clusterpodschedulingpolicy_types.go#L43).\n\nOnly a pod whose namespace matches the `spec.namespaceSelector` and also matches the `spec.podSelector` will be applied the specified pod scheduling policy.\n\nAn explicitly specified empty selector would match all objects (i.e. namespaces and pods respectively).\n\nA `nil` selector (i.e. not specified in the `spec`) will match no objects (i.e. namespaces and pods respectively).\n\n### Support for top-down pod scheduling criteria\n\nPods can continue to specify their scheduling criteria explicitly in a top-down way.\n\nOne way to make this possible is to use the `spec.namespaceSelector` and `spec.podSelector` judiciously so that the pods that specify their own scheduling criteria do not get targeted by any of the declared scheduling policies.\n\nIf any additional declared [`PodSchedulingPolicy`](#podschedulingpolicy) or [`ClusterPodSchedulingPolicy`](#clusterpodschedulingpolicy) are applicable for such pods, then the pod scheduling criteria will be merged with the already defined scheduling criteria specified in the pod.\n\nDuring merging, if there is a conflict between the already existing pod scheduling criteria and the additional pod scheduling criteria that is being merged, then only the non-conflicting part of the additional pod scheduling criteria will be merged, and the conflicting part will be skipped.\n\n### Gardener Integration Sequence Diagram\n\n![Gardener Integration Sequence Diagram](docs/gardener-integration-flow-pods.svg)\n\n### Pros\n\nThis solution has the following benefits.\n\n1. Systems that provision and manager workloads on the clusters such as CI/CD pipelines, helm charts, operators and controllers do not have to embed the knowledge of cluster topology.\n1. A cluster administrator can inject cluster topology constraints into scheduling of workloads. Constraints which are not taken into account by the provisioning systems.\n1. A cluster administrator can enforce some default cluster topology constraints into the workload as a policy.\n\n### Cons\n\n1. Pod creations go through an additional mutating webhook. The scheduling performance impact of this can be mitigated by using the `namespaceSelector` and `podSelector` fields in the policies judiciously.\n1. Pods already scheduled in the cluster will not be affected by newly created policies. Pods must be recreated to get the new policies applied.\n\n### Mutating higher-order controllers\n\nThough this document talks about mutating `pods` dynamically to inject declaratively defined scheduling policies, in principle, it might be useful to mutate the pod templates in higher order controller resources like `replicationcontrollers`, `replicasets`, `deployments`, `statefulsets`, `daemonsets`, `jobs` and `cronjobs` instead of (or in addition to) mutating `pods` directly.\nThis is supported by kupid. Which objects are mutated is now controllable in the [`MutatingWebhookConfiguration`](config/webhook-config/mutating-webhook-config.yaml).\n\n#### Sequence Diagram\n\n![Sequence Diagram](docs/kupid-flow-controllers.svg)\n\n#### Gardener Integration Sequence Diagram\n\n![Gardener Integration Sequence Diagram](docs/gardener-integration-flow-controllers.svg)\n\n## Alternatives\n\n### Propagate flexibility up the chain\n\nExpose the flexibility of specifying pod scheduling mechanism all the way up the chain.\nI.e. in `deployments`, `statefulsets`, operator CRDs, helm chart configuration or some other form of configuration.\nThis suffers from polluting many layers with information that is not too relevant at those levels.\n\n### Make assumptions\n\nMake some assumptions about the pod scheduling mechanism at some level of deployment and management of the workload.\nThis would not be flexible and will make it hard to change the pod scheduling behavior.\n\n## Prior Art\n\n### `PodPreset`\n\nThe standard [`PodPreset`](https://kubernetes.io/docs/concepts/workloads/pods/podpreset/) resource limits itself to the dynamic injection of only environment variables, secrets, configmaps, volumes and volume mounts into `pods`.\nThere is mechanism to define and inject other fields (especially, those related to scheduling) into `pods`.\n\n### Banzai Cloud Spot Config Webhook\n\nThe [spot-config-webhook](https://github.com/banzaicloud/spot-config-webhook) limits itself to the dynamic injection of the `schedulerName` into `pods`. There is no mechanism to define and inject other fields like `affinity`, `tolerations` etc.\n\n### OPA Gatekeeper\n\nThe OPA [Gatekeeper](https://github.com/open-policy-agent/gatekeeper/) allows to define a policy to validate and mutate any kubernetes resource.\nTechnically, this can be used to dynamically inject anything, including scheduling policy into `pods`.\nBut this is too big a component to introduce just to dynamically inject scheduling policy.\nBesides, the policy definition as code is undesirable in this context because the policy itself would be non-declarative and hard to validate while deploying the policy.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgardener%2Fkupid","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgardener%2Fkupid","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgardener%2Fkupid/lists"}