{"id":19096606,"url":"https://github.com/sapcc/swift-drive-autopilot","last_synced_at":"2025-04-30T14:15:35.181Z","repository":{"id":12219528,"uuid":"70789089","full_name":"sapcc/swift-drive-autopilot","owner":"sapcc","description":"Finds and mounts Swift storage drives (also from within a container)","archived":false,"fork":false,"pushed_at":"2025-04-18T04:31:17.000Z","size":5975,"stargazers_count":8,"open_issues_count":6,"forks_count":0,"subscribers_count":55,"default_branch":"master","last_synced_at":"2025-04-18T17:40:05.856Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sapcc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2016-10-13T09:09:39.000Z","updated_at":"2025-04-18T04:31:19.000Z","dependencies_parsed_at":"2024-03-28T00:25:04.272Z","dependency_job_id":"d5580f60-18da-4835-a37f-d09e915b4289","html_url":"https://github.com/sapcc/swift-drive-autopilot","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sapcc%2Fswift-drive-autopilot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sapcc%2Fswift-drive-autopilot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sapcc%2Fswift-drive-autopilot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sapcc%2Fswift-drive-autopilot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sapcc","download_url":"https://codeload.github.com/sapcc/swift-drive-autopilot/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251719470,"owners_count":21632684,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T03:37:13.124Z","updated_at":"2025-04-30T14:15:35.149Z","avatar_url":"https://github.com/sapcc.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# swift-drive-autopilot\n\nThis service finds, formats and mounts Swift storage drives, usually from\nwithin a container on a Kubernetes host.\n\n## How it works\n\nSwift expects its drives to be mounted at `/srv/node/$id`, where the `$id`\nidentifier is referenced in the cluster's **ring files**. The usual method is\nto set `$id` equal to the device's name in `/dev`, e.g. `/dev/sdc` becomes\n`/srv/node/sdc`, but that mapping is too rigid for some situations.\n\n`swift-drive-autopilot` establishes disk identity by examining a special file\ncalled `swift-id` in the root directory of the disk. In detail, it performs the\nfollowing steps:\n\n1. enumerate all storage drives (using a configurable list of globs)\n\n2. (optional) create a LUKS encryption container on fresh devices, or unlock an\n   existing one\n\n3. create an XFS filesystem on devices that do not have a filesystem yet\n\n4. mount each device below `/run/swift-storage` with a temporary name\n\n5. examine each device's `swift-id` file, and if it is present and unique,\n   mount it to `/srv/node/$id`\n\nAs a special case, disks with a `swift-id` of `\"spare\"` will not be mounted\ninto `/srv/node`, but will be held back as spare disks.\n\nThe autopilot then continues to run and will react to various types of events:\n\n1. A new device file appears. It will be decrypted and mounted (and formatted\n   if necessary).\n\n2. A device file disappears. Any active mounts or mappings will be cleaned up.\n   (This is especially helpful with hot-swappable hard drives.)\n\n3. The kernel log contains a line like `error on /dev/sda`. The offending\n   device will be marked as unhealthy and unmounted from `/srv/node`. The\n   other mappings and mounts are left intact for the administrator to inspect.\n\n   This means that you do not need `swift-drive-audit` if you're using the\n   autopilot.\n\n4. Mounts of managed devices disappear unexpectedly. The offending device will\n   be marked as unhealthy (see previous point).\n\n5. After a failure of one of the active disks, an operator removes the failed\n   disk, locates a spare disk and changes its `swift-id` to that of the failed\n   disk. The autopilot will mount the new disk in the place of the old one.\n\nInternally, events are collected by *collector* threads, and handled by the\nsingle *converger* thread.\n\n### Operational considerations\n\n`swift-drive-autopilot` runs under the assumption that a few disks are better\nthan no disks. If some operation relating to a single disk fails, the autopilot\nwill log an error and keep going. This means that it is absolutely crucial that\nyou have proper alerting in place for log messages with the `ERROR` label.\n\n## Installation\n\nTo build the binary:\n\n```bash\nmake\n```\n\nThe binary can also be installed with `go get`:\n```bash\ngo get github.com/sapcc/swift-drive-autopilot\n```\n\nTo build the Docker container: (Note that this requires a fairly recent Docker since a [multi-staged\nbuild](https://docs.docker.com/engine/userguide/eng-image/multistage-build/) is used.)\n\n```bash\ndocker build .\n```\n\nTo run the integration tests: (Note that this actually runs the autopilot on your system and thus requires root or `sudo` for mounting, device-mapper etc.)\n\n```bash\nmake check\n```\n\n## Development setup\n\nPlease see [HACKING.md](./HACKING.md).\n\n## Usage\n\nCall with a configuration file as single argument. The configuration file is a\nYAML and the following options are supported:\n\n```yaml\ndrives:\n  - /dev/sd[a-z]\n  - /dev/sd[a-z][a-z]\n```\n\nThe only required field, `drives`, contains the paths of the Swift storage\ndrives, as a list of shell globs.\n\nAs a special rule, the autopilot will ignore all drives that contain valid\npartition tables. This rule allows one to use a very general glob, like\n`/dev/sd[a-z]`, without knowing the actual disk layout in advance. The system\ninstallation will usually reside on a partitioned disk (because of the need for\nspecial partitions such as boot and swap partition), so it will be ignored by\nthe autopilot. Any other disks can be used for non-Swift purposes as long as\nthey are partitioned into at least one partition.\n\nFor this reason, the two globs shown above with will be appropriate for most\nsystems of all sizes.\n\n```yaml\nmetrics-listen-address: \":9102\"\n```\n\nIf given, expose a Prometheus metrics endpoint on this port below the path\n`/metrics`. The following metrics are provided:\n\n- `swift_drive_autopilot_events`: counter for handled events (sorted by `type`,\n  e.g. `type=drive-added`)\n\nIf Prometheus is used for alerting, it is useful to set an alert on\n`rate(swift_drive_autopilot_events[type=\"consistency-check\"])`. Consistency\ncheck events should occur twice a minute.\n\n```yaml\nchroot: /coreos\n```\n\nIf `chroot` is set, commands like cryptsetup/mkfs/mount will be executed inside\nthe chroot. This allows to use the host OS's utilities instead of those from\nthe container.\n\n```yaml\nchown:\n  user: \"1000\"\n  group: \"swift\"\n```\n\nIf `chown` is set, mountpoints below `/srv/node` and `/var/cache/swift` will be chown'ed to this user\nand/or group after mounting. Give the UID/GID or names of the Swift user and\ngroup here.\n\n```yaml\nkeys:\n  - secret: \"bzQoG5HN4onnEis5bhDmnYqqacoLNCSmDbFEAb3VDztmBtGobH\"\n  - secret: { fromEnv: ENVIRONMENT_VARIABLE }\n```\n\nIf `keys` is set, automatic disk encryption handling is activated. LUKS\ncontainers on the drives will be decrypted automatically, and empty drives will\nbe encrypted with LUKS before a filesystem is created.\n\nWhen decrypting, each of the keys is tried until one works, but only the first\none is used when creating new LUKS containers.\n\nCurrently, the `secret` will be used as encryption key directly. Other key\nderivation schemes may be supported in the future.\n\nInstead of providing `secret` as plain text in the config file, you can use a\nspecial syntax (`fromEnv`) to read the respective encryption key from an\nexported environment variable.\n\n```yaml\nswift-id-pool: [ \"swift1\", \"swift2\", \"swift3\", \"swift4\", \"swift5\", \"swift6\" ]\n```\n\nIf `swift-id-pool` is set, when a new drive is formatted, it will be assigned an\nunused `swift-id` from this pool. This allows a new node to go from unformatted\ndrives to a fully operational Swift drive setup without any human intervention.\n\nAutomatic assignment will only happen during the initial formatting (i.e. when\nno LUKS container or filesystem or active mount is found on the drive).\nAutomatic assignment will *not* happen if there is any broken drive (since the\nautopilot cannot check the broken drive's `swift-id`, any automatic assignment\ncould result in a duplicate `swift-id`).\n\nIDs are assigned in the order in which they appear in the YAML file. If there\nare only four drives, using the configuration above, they will definitely be\nidentified as `swift1` through `swift4`.\n\nAs a special case, the special ID `spare` may be given multiple times. The\nordering still matters, so disks will be assigned or reserved as spare in the\norder that you wish. For example:\n\n```yaml\n# This will always keep two spare disks.\nswift-id-pool: [ \"spare\", \"spare\", \"swift1\", \"swift2\", \"swift3\", \"swift4\", \"swift5\", \"swift6\", ... ]\n\n# Something like this will keep one spare disk per three active disks.\nswift-id-pool: [ \"swift1\", \"swift2\", \"swift3\", \"spare\", \"swift4\", \"swift5\", \"swift6\", \"spare\", ... ]\n```\n\n### Runtime interface\n\nThe autopilot advertises its state by writing the following files and\ndirectories:\n`swift-drive-autopilot` maintains the directory `/run/swift-storage/state` to\nstore and advertise state information. (If a chroot is configured, then this\npath refers to inside the chroot.) Currently, the following files will be\nwritten:\n\n* `/run/swift-storage/state/flag-ready` is an empty file whose existence marks\n  that the autopilot has handled each available drive at least once. This flag\n  can be used to delay the startup of Swift services until storage is available.\n\n* `/run/swift-storage/state/unmount-propagation` is a directory containing a\n  symlink for each drive that was unmounted by the autopilot. The intention\n  of this mechanism is to propagate unmounting of broken drives to Swift\n  services running in separate mount namespaces. For example, if the other\n  service sees `/run/swift-storage/state/unmount-propagation/foo`, it shall\n  unmount `/srv/node/foo` from its local mount namespace.\n\n  `/run/swift-storage/state/unmount-propagation` can be ignored unless you have\n  Swift services running in multiple private mount namespaces, typically\n  because of containers and because your orchestrator cannot setup shared or\n  slave mount namespaces (e.g.  Kubernetes). In plain Docker, pass `/srv/node`\n  to the Swift service with the `slave` or `shared` option, and mounts/unmounts\n  made by the autopilot will propagate automatically.\n\n* `/run/swift-storage/broken` is a directory containing symlinks to all drives\n  deemed broken by the autopilot. When the autopilot finds a broken device, its\n  log will explain why the device is considered broken, and how to reinstate the\n  device into the cluster after resolving the issue.\n\n* `/var/lib/swift-storage/broken` has the same structure and semantics as\n  `/run/swift-storage/broken`, but its contents are retained across reboots. A\n  flag from `/run/swift-storage/broken` can be copied to\n  `/var/lib/swift-storage/broken` to disable the device in a more durable way,\n  once a disk hardware error has been confirmed.\n\n  The durable broken flag can also be created manually using the command\n  `ln -s /dev/sd$LETTER /var/lib/swift-storage/broken/$SERIAL`. The disk's\n  serial number can be found using `smartctl -d scsi -i /dev/sd$LETTER`.\n\n* Since the autopilot also does the job of `swift-drive-audit`, it honors its\n  interface and writes `/var/cache/swift/drive.recon`. Drive errors detected by\n  the autopilot will thus show up in `swift-recon --driveaudit`.\n\n### In Docker\n\nWhen used as a container, supply the host's root filesystem as a bind-mount and\nset the `chroot` option to its mount point inside the container. Also, the\ncontainer has to run in privileged mode to access the host's block devices and\nperform mounts in the root mount namespace:\n\n```bash\n$ cat \u003e config.yml\ndrives:\n  - /dev/sd[c-z]\nchroot: /host\n$ docker run --privileged --rm -v $PWD/config.yml:/config.yml -v /:/host sapcc/swift-drive-autopilot:latest /config.yml\n```\n\n**Warning:** The **entire** host filesystem must be passed in as a single bind\nmount. Otherwise, the autopilot will be unable to correctly detect the mount\npropagation mode.\n\n### In Kubernetes\n\nYou will probably want to run this as a daemonset with the `nodeSelector`\nmatching your Swift storage nodes. Like described for Docker above, make sure\nto mount the host's root filesystem into the container (with a `hostPath`\nvolume) and run the container in privileged mode (by setting\n`securityContext.privileged` to `true` in the container spec).\n\nAny other Swift containers should have access to the host's\n`/run/swift-storage/state` directory (using a `hostPath` volume) and wait for\nthe file `flag-ready` to appear before starting up.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsapcc%2Fswift-drive-autopilot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsapcc%2Fswift-drive-autopilot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsapcc%2Fswift-drive-autopilot/lists"}