{"id":16813094,"url":"https://github.com/ahrtr/etcd-defrag","last_synced_at":"2025-04-05T18:06:57.694Z","repository":{"id":154173468,"uuid":"629776015","full_name":"ahrtr/etcd-defrag","owner":"ahrtr","description":"An easier to use and smarter etcd defragmentation tool","archived":false,"fork":false,"pushed_at":"2025-03-24T20:43:58.000Z","size":242,"stargazers_count":103,"open_issues_count":6,"forks_count":12,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-29T17:07:12.444Z","etag":null,"topics":["defragmentation","etcd"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ahrtr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-19T02:14:15.000Z","updated_at":"2025-03-26T16:34:37.000Z","dependencies_parsed_at":"2024-01-20T16:44:01.800Z","dependency_job_id":"df6abf09-0c73-47fe-9cea-7d921ca58d65","html_url":"https://github.com/ahrtr/etcd-defrag","commit_stats":null,"previous_names":[],"tags_count":25,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahrtr%2Fetcd-defrag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahrtr%2Fetcd-defrag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahrtr%2Fetcd-defrag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahrtr%2Fetcd-defrag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ahrtr","download_url":"https://codeload.github.com/ahrtr/etcd-defrag/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247378141,"owners_count":20929296,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["defragmentation","etcd"],"created_at":"2024-10-13T10:24:41.196Z","updated_at":"2025-04-05T18:06:57.669Z","avatar_url":"https://github.com/ahrtr.png","language":"Go","funding_links":[],"categories":["others","Go"],"sub_categories":[],"readme":"etcd-defrag\n======\n## Table of Contents\n\n\u003c!-- START doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --\u003e\n\n- [Overview](#overview)\n- [Integration with Kubernetes with a CronJob](#integration-with-kubernetes-with-a-cronjob)\n- [Examples](#examples)\n  - [Example 1: run defragmentation on one endpoint](#example-1-run-defragmentation-on-one-endpoint)\n  - [Example 2: run defragmentation on multiple endpoints](#example-2-run-defragmentation-on-multiple-endpoints)\n  - [Example 3: run defragmentation on all members in the cluster](#example-3-run-defragmentation-on-all-members-in-the-cluster)\n- [Defragmentation rule](#defragmentation-rule)\n- [Container image](#container-image)\n- [Contributing](#contributing)\n- [Note](#note)\n\n\u003c!-- END doctoc generated TOC please keep comment here to allow auto update --\u003e\n\n\n## Overview\netcd-defrag is an easier to use and smarter etcd defragmentation tool. It references the implementation\nof `etcdctl defrag` command, but with big refactoring and extra enhancements below,\n- check the status of all members, and stop the operation if any member is unhealthy. Note that it ignores the `NOSPACE` alarm\n- run defragmentation on the leader last\n- support rule based defragmentation\n\netcd-defrag reuses all the existing flags accepted by `etcdctl defrag`, so basically it doesn't break\nany existing user experience, but with additional benefits. Users can just replace `etcdctl defrag [flags]`\nwith `etcd-defrag [flags]` without compromising any experience.\n\nIt adds the following extra flags,\n| Flag                         | Description |\n|------------------------------|-------------|\n| `---compaction`              | whether execute compaction before the defragmentation, defaults to `true` |\n| `--continue-on-error`        | whether continue to defragment next endpoint if current one fails, defaults to `true` |\n| `--etcd-storage-quota-bytes` | etcd storage quota in bytes (the value passed to etcd instance by flag --quota-backend-bytes), defaults to `2*1024*1024*1024` |\n| `--defrag-rule`              | defragmentation rule (etcd-defrag will run defragmentation if the rule is empty or it is evaluated to true), defaults to empty. See more details below. |\n| `--dry-run`                  | evaluate whether or not endpoints require defragmentation, but don't actually perform it, defaults to `false`. |\n| `--exclude-localhost`        | whether to exclude localhost endpoints, defaults to `false`. |\n| `--move-leader`              | whether to move the leadership before performing defragmentation on the leader, defaults to `false`. |\n\nSee the complete flags below,\n```\n$ ./etcd-defrag -h\nA simple command line tool for etcd defragmentation\n\nUsage:\n  etcd-defrag [flags]\n\nFlags:\n      --cacert string                  verify certificates of TLS-enabled secure servers using this CA bundle\n      --cert string                    identify secure client using this TLS certificate file\n      --cluster                        use all endpoints from the cluster member list\n      --command-timeout duration       command timeout (excluding dial timeout) (default 30s)\n      --compaction                     whether execute compaction before the defragmentation (defaults to true) (default true)\n      --continue-on-error              whether continue to defragment next endpoint if current one fails (default true)\n      --defrag-rule string             defragmentation rule (etcd-defrag will run defragmentation if the rule is empty or it is evaluated to true)\n      --dial-timeout duration          dial timeout for client connections (default 2s)\n  -d, --discovery-srv string           domain name to query for SRV records describing cluster endpoints\n      --discovery-srv-name string      service name to query when using DNS discovery\n      --dry-run                        evaluate whether or not endpoints require defragmentation, but don't actually perform it\n      --endpoints strings              comma separated etcd endpoints (default [127.0.0.1:2379])\n      --etcd-storage-quota-bytes int   etcd storage quota in bytes (the value passed to etcd instance by flag --quota-backend-bytes) (default 2147483648)\n      --exclude-localhost              whether to exclude localhost endpoints\n  -h, --help                           help for etcd-defrag\n      --insecure-discovery             accept insecure SRV records describing cluster endpoints (default true)\n      --insecure-skip-tls-verify       skip server certificate verification (CAUTION: this option should be enabled only for testing purposes)\n      --insecure-transport             disable transport security for client connections (default true)\n      --keepalive-time duration        keepalive time for client connections (default 2s)\n      --keepalive-timeout duration     keepalive timeout for client connections (default 6s)\n      --key string                     identify secure client using this TLS key file\n      --move-leader                    whether to move the leadership before performing defragmentation on the leader\n      --password string                password for authentication (if this option is used, --user option shouldn't include password)\n      --user string                    username[:password] for authentication (prompt if password is not supplied)\n      --version                        print the version and exit\n```\n\nEnvironment variables can be used to set the flags, by setting the flag name in uppercase and prefixing it with `ETCD_DEFRAG_`. Please note that all hyphens should be replaced with underscores. For example, the flag `--move-leader` can be set with the environment variable `ETCD_DEFRAG_MOVE_LEADER`\n\nFlag values are evaluated in the following order: (from highest to lowest priority)\n\n1. Flags passed as command line arguments\n2. Environment variables\n3. Default values\n\n## Integration with Kubernetes with a CronJob\n\nIt is possible to use [the example cronjob in\n`./doc/etcd-defrag-cronjob.yaml`](./doc/etcd-defrag-cronjob.yaml) on Kubernetes\nenvironments where the etcd servers are colocated with the control plane nodes.\n\nThis example CronJob runs every weekday in the morning, and works by mounting\nthe `/etc/kubernetes/pki/etcd` folder inside the pod, thereby permitting to\ndefragment the etcd cluster inside the Kubernetes cluster itself. For more\ncomplex use cases you might to adapt the `--endpoints` and/or the certificates.\n\nThe example CronJob is per default configured with\n`node-role.kubernetes.io/control-plane` affinity, and with the `hostNetwork:\ntrue` spec, so that the `etcd` server co-located on the apiserver can be\nreached directly with `127.0.0.1:2379`.\n\n## Examples\n### Example 1: run defragmentation on one endpoint\nCommand:\n```\n$ ./etcd-defrag --endpoints=https://127.0.0.1:22379 --cacert ./ca.crt --key ./etcd-defrag.key --cert ./etcd-defrag.crt\n```\n\n### Example 2: run defragmentation on multiple endpoints\nCommand:\n```\n$ ./etcd-defrag --endpoints=https://127.0.0.1:22379,https://127.0.0.1:32379 --cacert ./ca.crt --key ./etcd-defrag.key --cert ./etcd-defrag.crt\n```\n\n### Example 3: run defragmentation on all members in the cluster\nCommand:\n```\n$ ./etcd-defrag --endpoints https://127.0.0.1:22379 --cluster --cacert ./ca.crt --key ./etcd-defrag.key --cert ./etcd-defrag.crt\n```\nOutput:\n```\nValidating configuration.\nNo defragmentation rule provided\nPerforming health check.\nendpoint: https://127.0.0.1:2379, health: true, took: 4.702492ms, error:\nendpoint: https://127.0.0.1:22379, health: true, took: 5.017075ms, error:\nendpoint: https://127.0.0.1:32379, health: true, took: 4.747068ms, error:\nGetting members status\nendpoint: https://127.0.0.1:2379, dbSize: 172032, dbSizeInUse: 126976, memberId: 8211f1d0f64f3269, leader: 8211f1d0f64f3269, revision: 10365, term: 2, index: 10425\nendpoint: https://127.0.0.1:22379, dbSize: 122880, dbSizeInUse: 122880, memberId: 91bc3c398fb3c146, leader: 8211f1d0f64f3269, revision: 10365, term: 2, index: 10425\nendpoint: https://127.0.0.1:32379, dbSize: 122880, dbSizeInUse: 122880, memberId: fd422379fda50e48, leader: 8211f1d0f64f3269, revision: 10365, term: 2, index: 10425\nRunning compaction until revision: 10365 ... successful\n3 endpoint(s) need to be defragmented: [https://127.0.0.1:22379 https://127.0.0.1:32379 https://127.0.0.1:2379]\n[Before defragmentation] endpoint: https://127.0.0.1:22379, dbSize: 126976, dbSizeInUse: 90112, memberId: 91bc3c398fb3c146, leader: 8211f1d0f64f3269, revision: 10365, term: 2, index: 10426\nDefragmenting endpoint \"https://127.0.0.1:22379\"\nFinished defragmenting etcd endpoint \"https://127.0.0.1:22379\". took 224.151378ms\n[Post defragmentation] endpoint: https://127.0.0.1:22379, dbSize: 90112, dbSizeInUse: 81920, memberId: 91bc3c398fb3c146, leader: 8211f1d0f64f3269, revision: 10365, term: 2, index: 10426\n[Before defragmentation] endpoint: https://127.0.0.1:32379, dbSize: 126976, dbSizeInUse: 90112, memberId: fd422379fda50e48, leader: 8211f1d0f64f3269, revision: 10365, term: 2, index: 10426\nDefragmenting endpoint \"https://127.0.0.1:32379\"\nFinished defragmenting etcd endpoint \"https://127.0.0.1:32379\". took 139.138035ms\n[Post defragmentation] endpoint: https://127.0.0.1:32379, dbSize: 90112, dbSizeInUse: 81920, memberId: fd422379fda50e48, leader: 8211f1d0f64f3269, revision: 10365, term: 2, index: 10426\n[Before defragmentation] endpoint: https://127.0.0.1:2379, dbSize: 172032, dbSizeInUse: 94208, memberId: 8211f1d0f64f3269, leader: 8211f1d0f64f3269, revision: 10365, term: 2, index: 10426\nDefragmenting endpoint \"https://127.0.0.1:2379\"\nFinished defragmenting etcd endpoint \"https://127.0.0.1:2379\". took 135.171807ms\n[Post defragmentation] endpoint: https://127.0.0.1:2379, dbSize: 90112, dbSizeInUse: 81920, memberId: 8211f1d0f64f3269, leader: 8211f1d0f64f3269, revision: 10365, term: 2, index: 10426\nThe defragmentation is successful.\n```\n\nOnly one endpoint is provided, but it still runs defragmentation on all members in the cluster thanks to the flag `--cluster`.\nNote that the endpoint `https://127.0.0.1:2379` is the leader, so it's placed at the end of the list,\n```\n3 endpoint(s) need to be defragmented: [https://127.0.0.1:22379 https://127.0.0.1:32379 https://127.0.0.1:2379]\n```\n```\n$ etcdctl endpoint status -w table --cluster\n+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+\n|        ENDPOINT         |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |\n+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+\n|  https://127.0.0.1:2379 | 8211f1d0f64f3269 |   3.5.8 |   25 kB |      true |      false |        10 |        164 |                164 |        |\n| https://127.0.0.1:22379 | 91bc3c398fb3c146 |   3.5.8 |   25 kB |     false |      false |        10 |        164 |                164 |        |\n| https://127.0.0.1:32379 | fd422379fda50e48 |   3.5.8 |   25 kB |     false |      false |        10 |        164 |                164 |        |\n+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+\n```\n## Defragmentation rule\nDefragmentation is an expensive operation, so it should be executed as infrequent as possible. On the other hand,\nit's also necessary to make sure any etcd member will not run out of the storage quota. It's exactly the reason\nwhy the defragmentation rule is introduced, it can skip unnecessary expensive defragmentation, and also keep\neach member safe.\n\nUsers can configure a defragmentation rule using the flag `--defrag-rule`. The rule must be a boolean expression,\nwhich means its evaluation result should be a boolean value. **It supports arithmetic (e.g. `+` `-` `*` `/` `%`) and logic\n(e.g. `==` `!=` `\u003c` `\u003e` `\u003c=` `\u003e=` `\u0026\u0026` `||` `!`) operators supported by golang. Parenthesis `()` can be used to control precedence**.\n\nCurrently, `etcd-defrag` supports three variables below,\n| Variable name   | Description |\n|---------------  |-------------|\n| `dbSize`        | total size of the etcd database |\n| `dbSizeInUse`   | total size in use of the etcd database |\n| `dbSizeFree`    | total size not in use of the etcd database, defined as dbSize - dbSizeInUse|\n| `dbQuota`       | etcd storage quota in bytes (the value passed to etcd instance by flag --quota-backend-bytes)|\n| `dbQuotaUsage`  | total usage of the etcd storage quota, defined as dbSize/dbQuota |\n\nFor example, if you want to run defragmentation if the total db size is greater than 80%\nof the quota **OR** there is at least 200MiB free space, the defragmentation rule is `dbSize \u003e dbQuota*80/100 || dbSize - dbSizeInUse \u003e 200*1024*1024`.\nThe complete command is below,\n```\n$ ./etcd-defrag --endpoints http://127.0.0.1:22379 --cluster --defrag-rule=\"dbSize \u003e dbQuota*80/100 || dbSize - dbSizeInUse \u003e 200*1024*1024\"\n```\nOr,\n```\n$ ./etcd-defrag --endpoints http://127.0.0.1:22379 --cluster --defrag-rule=\"dbQuotaUsage \u003e 0.8 || dbSizeFree \u003e 200*1024*1024\"\n```\n\nOutput:\n```\nValidating configuration.\nValidating the defragmentation rule: dbSize \u003e dbQuota*80/100 || dbSize - dbSizeInUse \u003e 200*1024*1024 ... valid\nPerforming health check.\nendpoint: http://127.0.0.1:2379, health: true, took: 6.993264ms, error:\nendpoint: http://127.0.0.1:32379, health: true, took: 7.483368ms, error:\nendpoint: http://127.0.0.1:22379, health: true, took: 49.441931ms, error:\nGetting members status\nendpoint: http://127.0.0.1:2379, dbSize: 131072, dbSizeInUse: 131072, memberId: 8211f1d0f64f3269, leader: 8211f1d0f64f3269, revision: 10964, term: 2, index: 11028\nendpoint: http://127.0.0.1:22379, dbSize: 131072, dbSizeInUse: 131072, memberId: 91bc3c398fb3c146, leader: 8211f1d0f64f3269, revision: 10964, term: 2, index: 11028\nendpoint: http://127.0.0.1:32379, dbSize: 131072, dbSizeInUse: 131072, memberId: fd422379fda50e48, leader: 8211f1d0f64f3269, revision: 10964, term: 2, index: 11028\nRunning compaction until revision: 10964 ... successful\n3 endpoint(s) need to be defragmented: [http://127.0.0.1:22379 http://127.0.0.1:32379 http://127.0.0.1:2379]\n[Before defragmentation] endpoint: http://127.0.0.1:22379, dbSize: 139264, dbSizeInUse: 90112, memberId: 91bc3c398fb3c146, leader: 8211f1d0f64f3269, revision: 10964, term: 2, index: 11029\nEvaluation result is false, so skipping endpoint: http://127.0.0.1:22379\n[Before defragmentation] endpoint: http://127.0.0.1:32379, dbSize: 139264, dbSizeInUse: 139264, memberId: fd422379fda50e48, leader: 8211f1d0f64f3269, revision: 10964, term: 2, index: 11029\nEvaluation result is false, so skipping endpoint: http://127.0.0.1:32379\n[Before defragmentation] endpoint: http://127.0.0.1:2379, dbSize: 139264, dbSizeInUse: 90112, memberId: 8211f1d0f64f3269, leader: 8211f1d0f64f3269, revision: 10964, term: 2, index: 11029\nEvaluation result is false, so skipping endpoint: http://127.0.0.1:2379\nThe defragmentation is successful.\n```\n\nIf you want to run defragmentation when both conditions are true, namely the total db size is greater than 80%\nof the quota **AND** there is at least 200MiB free space, then run command below,\n```\n$ ./etcd-defrag --endpoints http://127.0.0.1:22379 --cluster --defrag-rule=\"dbSize \u003e dbQuota*80/100 \u0026\u0026 dbSize - dbSizeInUse \u003e 200*1024*1024\"\n```\n\n## Container image\nContainer images are released automatically using GitHub actions and [`ko-build/ko`](https://github.com/ko-build/ko).\nThey can be used as follows:\n\n```bash\n$ docker pull ghcr.io/ahrtr/etcd-defrag:latest\n```\n\nAlternatively, you can build your own container images with:\n\n```bash\n$ DOCKER_BUILDKIT=1 docker build -t \"etcd-defrag:${VERSION}\" -f Dockerfile .\n```\n\nIf you need an image for another `GOARCH` (e.g. `ppc64le` or `s390x`) other than `amd64` or `arm64`, use a command something like below,\n```bash\n$ DOCKER_BUILDKIT=1 docker build --build-arg ARCH=${ARCH} -t \"etcd-defrag:${VERSION}\" -f Dockerfile .\n```\n\n## Contributing\nAny contribution is welcome!\n\n## Note\n- Please ensure running etcd on a version \u003e= 3.5.6, and read [Two possible data inconsistency issues in etcd](https://groups.google.com/g/etcd-dev/c/8S7u6NqW6C4) to get more details.\n- Please do not get learner members' endpoints included in `--endpoints`, refer to discussion in https://github.com/ahrtr/etcd-defrag/issues/26.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahrtr%2Fetcd-defrag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fahrtr%2Fetcd-defrag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahrtr%2Fetcd-defrag/lists"}