{"id":51384996,"url":"https://github.com/cozystack/etcd-operator","last_synced_at":"2026-07-03T19:37:56.078Z","repository":{"id":226803602,"uuid":"768702598","full_name":"cozystack/etcd-operator","owner":"cozystack","description":"New generation community-driven etcd-operator!","archived":false,"fork":false,"pushed_at":"2026-07-03T07:10:41.000Z","size":1745,"stargazers_count":143,"open_issues_count":24,"forks_count":27,"subscribers_count":13,"default_branch":"main","last_synced_at":"2026-07-03T19:37:50.512Z","etag":null,"topics":["etcd","kubernetes","operator"],"latest_commit_sha":null,"homepage":"https://etcd.aenix.io","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cozystack.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-03-07T15:21:03.000Z","updated_at":"2026-06-29T13:01:19.000Z","dependencies_parsed_at":"2024-03-09T20:22:55.758Z","dependency_job_id":"7af3ad9c-c0e4-4abe-8a59-c8ce7c988cd4","html_url":"https://github.com/cozystack/etcd-operator","commit_stats":null,"previous_names":["aenix-io/etcd-operator","cozystack/etcd-operator"],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/cozystack/etcd-operator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cozystack%2Fetcd-operator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cozystack%2Fetcd-operator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cozystack%2Fetcd-operator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cozystack%2Fetcd-operator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cozystack","download_url":"https://codeload.github.com/cozystack/etcd-operator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cozystack%2Fetcd-operator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35099548,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-03T02:00:05.635Z","response_time":110,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["etcd","kubernetes","operator"],"created_at":"2026-07-03T19:37:55.445Z","updated_at":"2026-07-03T19:37:56.069Z","avatar_url":"https://github.com/cozystack.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# etcd-operator\n\nA Kubernetes operator for running [etcd](https://etcd.io/) clusters. Status: **early alpha** — API is `etcd-operator.cozystack.io/v1alpha2` and will likely change.\n\n## What it does\n\nThe operator manages etcd clusters via two custom resources:\n\n- **`EtcdCluster`** — what the user creates. Captures cluster-wide intent: replica count, etcd version, per-member storage size, a progress deadline.\n- **`EtcdMember`** — what the operator creates. One per etcd member. Owns its Pod and PVC. Operator-managed; users should not edit these directly.\n\nThere is no StatefulSet. Each member's Pod and PVC are reconciled independently so the operator can model protocol-aware lifecycle (learner-mode joins, member-id assignment, graceful removal, scale-to-zero pause/resume) without fighting StatefulSet's \"all replicas are one workload\" assumption.\n\nThe full design rationale is in [docs/concepts.md](docs/concepts.md).\n\n## What's supported today\n\n- **Bootstrap** of new clusters. Single seed first, learner-mode adds afterwards.\n- **Scale up / down**: cluster controller adds members one at a time as learners and promotes them; scale-down picks the most-recently-created member, runs `MemberRemove` via a finalizer, then GCs the Pod and PVC.\n- **Scale to zero (pause/resume)**: `spec.replicas: 0` parks the surviving member via `spec.dormant=true`; the Pod is deleted, the PVC stays owned by the `EtcdMember`. Scaling back up to ≥ 1 flips `spec.dormant=false` on the same member; etcd resumes from the existing data dir with the same cluster ID and member ID.\n- **Pod restart / node failure**: data PVC is preserved, the new Pod reads the existing WAL and rejoins with the same member ID.\n- **Memory-backed storage (opt-in)**: `spec.storage.medium: Memory` switches each member's data dir to a tmpfs `emptyDir` whose lifetime is bound to the Pod. Members that lose their Pod (eviction, node failure) lose their data; the operator detects this, removes the member from etcd, and replaces it via the existing scale-up path. Suits scenarios where the etcd state is reconstructable and replication absorbs single-member losses. For production, set `spec.affinity` and `spec.resources.limits.memory` explicitly — neither is defaulted ([#16](https://github.com/lllamnyp/etcd-operator/issues/16)); see [docs/concepts.md](docs/concepts.md#storage).\n- **Apiserver-enforced validation**: CEL rules on the CRD (k8s 1.29+) reject `replicas: 0` with `storage.medium: Memory`, `storage.size: 0` with `storage.medium: Memory`, `storage.medium` changes after creation, and `storage.size` shrinks. No webhook / cert-manager dependency.\n- **PodDisruptionBudget**: per-cluster PDB selects voting members only (`role=voter`); `maxUnavailable = (voters-1)/2` so `kubectl drain` cannot voluntarily push the cluster below quorum.\n- **TLS (BYO Secrets or cert-manager)**: `spec.tls.client` / `spec.tls.peer` enable TLS on each surface independently. Material comes from either user-provided Secrets (`serverSecretRef` / `operatorClientSecretRef` / `secretRef`) or operator-emitted `cert-manager.io/v1` Certificates (`certManager.{serverIssuerRef,operatorClientIssuerRef,issuerRef}`) — mutually exclusive per subtree, enforced by CEL. mTLS is the implicit mode when an operator-client source is supplied; server-TLS-only when it isn't. The whole `tls` subtree is CEL-locked immutable post-create. cert-manager-emitted certs auto-renew via cert-manager; Pod-side rotation is a manual one-at-a-time `kubectl delete pod` either way. See [docs/concepts.md](docs/concepts.md#tls).\n- **Resource sizing**: `spec.resources` (a `corev1.ResourceRequirements`) sets the etcd container's CPU/memory requests and limits. Unset uses a conservative 100m/128Mi-request default. Updates take effect on newly-created members; pair with a `VerticalPodAutoscaler` targeting the cluster for live recommendation/rollout.\n- **Scheduling \u0026 extra metadata**: `spec.affinity` and `spec.topologySpreadConstraints` pass through to every member Pod (anti-affinity is not defaulted — set it for production); `spec.additionalMetadata` merges user labels/annotations onto every object the operator creates (member Pods, data PVCs, Services, PDB, `EtcdMember` CRs), with operator-owned keys winning on collision. All three apply on object creation and are latched like the rest of the spec. See [docs/concepts.md](docs/concepts.md#pod-scheduling-and-additional-metadata).\n- **Monitoring / autoscaling hooks**: every member Pod always exposes a plaintext `metrics` container port at `2381` (etcd's `/health` + Prometheus `/metrics`) for `VMPodScrape` / `PodMonitor`. The `EtcdCluster` CRD exposes the `/scale` subresource with a populated `status.selector`, making it a valid target for `kubectl scale` and `VerticalPodAutoscaler.targetRef`.\n- **Locking pattern**: `status.observed` snapshots the in-flight target so mid-flight spec edits don't corrupt consensus; `progressDeadline` bounds how long the operator will spend trying to reach a target.\n- **Cluster deletion**: cascading owner refs clean up everything; finalizers detect \"the whole cluster is going away\" and skip etcd-side removal to avoid deadlock.\n- **Snapshots \u0026 restore**: `EtcdSnapshot` captures a one-shot snapshot of a cluster to S3 (or a PVC) via a Job running the operator image as a snapshot agent; `status.artifact` records the stored object's URI, size, and checksum. A new cluster restores from a snapshot at first bootstrap via `spec.bootstrap.restore.source` (the seed Pod runs a restore initContainer before etcd starts). TLS and `spec.auth` auth are honored automatically. No scheduled snapshots (`EtcdSnapshotSchedule` is intentionally out of scope) — drive recurring snapshots with a `CronJob`/`kubectl apply` from outside. See [docs/concepts.md](docs/concepts.md#snapshots--restore) and the [restore runbook](docs/operations.md#restoring-a-cluster-from-a-snapshot).\n\n## What's not supported (yet)\n\nNo multi-user / per-tenant RBAC inside etcd — single-user `root` auth is available via `spec.auth.enabled` (BYO credentials Secret; see [docs/concepts.md](docs/concepts.md#authentication)), but every authenticated client is `root`. No in-place version upgrades (changing `spec.version` only affects newly-created members). No PVC resizing — see [#2](https://github.com/lllamnyp/etcd-operator/issues/2). PVC-backed members auto-replace only on a *persistent crash-loop* (a lost or corrupt data dir whose etcd cannot boot) — quorum-gated, and far slower than the seconds-fast Pod-loss path memory-backed members get (tens of minutes, at the CrashLoopBackOff cap); a member that is merely slow or flapping is left alone. `status.brokenMembers` still reads 0 in practice — see [docs/concepts.md](docs/concepts.md#storage). One-shot snapshots and restore-on-bootstrap are supported (see above), but there is no *scheduled* snapshot CRD. No defragmentation scheduling. PodAntiAffinity is supported via `spec.affinity` but not applied by default (defaulting tracked in [#16](https://github.com/lllamnyp/etcd-operator/issues/16)). See the [issue tracker](https://github.com/lllamnyp/etcd-operator/issues) for the running follow-up list.\n\n## Quick start\n\n```sh\n# 1. Install the operator (CRDs + RBAC + manager) with Helm. Builds an image and\n#    pushes it to your registry; substitute IMG= for a prebuilt tag if you have\n#    one. The cluster must be able to pull from \u003cyour-registry\u003e — for local\n#    clusters (kind / minikube / k3d) sideload the image or use an ephemeral\n#    registry such as ttl.sh, otherwise the Deployment sits in ImagePullBackOff.\n#    `make deploy` runs `helm upgrade --install` (needs helm v3.16+ on PATH).\nmake docker-build docker-push deploy IMG=\u003cyour-registry\u003e/etcd-operator:\u003ctag\u003e\n\n# 2. Create a cluster.\ncat \u003c\u003c'EOF' | kubectl apply -f -\napiVersion: etcd-operator.cozystack.io/v1alpha2\nkind: EtcdCluster\nmetadata:\n  name: my-etcd\n  namespace: default\nspec:\n  replicas: 3\n  version: 3.6.11\n  storage:\n    size: 1Gi\nEOF\n\n# 3. Wait for ready and inspect.\nkubectl get etcdcluster.etcd-operator.cozystack.io my-etcd -w\nPOD=$(kubectl get pod -l etcd-operator.cozystack.io/cluster=my-etcd \\\n  -o jsonpath='{.items[0].metadata.name}')\nkubectl exec -it \"$POD\" -- etcdctl --endpoints=http://localhost:2379 \\\n  member list -w table\n```\n\nMember names are apiserver-assigned (`GenerateName=\"\u003ccluster\u003e-\"`) — don't hard-code them; use the cluster label selector.\n\nFor step-by-step setup, RBAC, image versions, and teardown see [docs/installation.md](docs/installation.md).\n\n## Documentation\n\n- **[Installation](docs/installation.md)** — deploy the operator, create your first cluster, networking pitfalls, upgrades.\n- **[Concepts](docs/concepts.md)** — design rationale: locking pattern, single-seed bootstrap, GenerateName naming, scale-to-zero mechanics, conditions reference.\n- **[Operations](docs/operations.md)** — runbook for day-2: scaling, pausing/resuming, decoding conditions, escalating stuck reconciles, broken-member recovery.\n- **[Migration](docs/migration.md)** — moving onto this operator from the legacy aenix operator; tracks behavioural changes that need an explicit migration step — currently the BYO root-credentials requirement when enabling auth.\n\n## Testing\n\n```sh\ngo test ./controllers/...\n```\n\nThe suite uses controller-runtime's fake client and a fake etcd client; no envtest assets needed at the unit level. Pinned behaviours:\n\n- **Bootstrap** — single-seed creation, idempotent recovery, `GenerateName`-assigned names.\n- **Locking pattern** — `status.observed` / `progressDeadline` lock the in-flight target; bootstrap-deadline is terminal.\n- **Scale up** — learner-mode add, readiness gate before the next step, crash-recovery branches between `Create` / `MemberAddAsLearner` / `Patch(initialCluster)`.\n- **Scale down** — `CreationTimestamp` DESC (name DESC tiebreak) victim selection, finalizer-driven `MemberRemove`.\n- **Scale to zero** — 1→0 Patches `spec.dormant=true`; 0→1 flips it back; dormant member's Pod is gone but its PVC is preserved.\n- **Discovery** — seed found via `spec.bootstrap=true`; etcd client endpoints filtered to voters (`MemberReady=True`) so `MemberList` doesn't route to a learner.\n- **Status no-churn** — steady-state reconciles don't repeatedly mutate status.\n\n## License\n\nApache 2.0. See `LICENSE`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcozystack%2Fetcd-operator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcozystack%2Fetcd-operator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcozystack%2Fetcd-operator/lists"}