{"id":22379124,"url":"https://github.com/sky-uk/clusterverse","last_synced_at":"2025-07-31T01:31:49.221Z","repository":{"id":41853613,"uuid":"236051882","full_name":"sky-uk/clusterverse","owner":"sky-uk","description":"Full-lifecycle cloud infrastructure cluster management, using Ansible","archived":false,"fork":false,"pushed_at":"2024-11-28T15:52:39.000Z","size":950,"stargazers_count":15,"open_issues_count":3,"forks_count":8,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-11-28T16:44:48.137Z","etag":null,"topics":["ansible","ansible-role","automation","aws","gcp","lifecycle-management"],"latest_commit_sha":null,"homepage":"","language":"Groovy","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sky-uk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-01-24T17:41:03.000Z","updated_at":"2024-05-30T11:24:05.000Z","dependencies_parsed_at":"2023-02-14T08:16:57.612Z","dependency_job_id":"fbec700c-a021-4b36-8e34-83142fdfa5a2","html_url":"https://github.com/sky-uk/clusterverse","commit_stats":null,"previous_names":[],"tags_count":67,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sky-uk%2Fclusterverse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sky-uk%2Fclusterverse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sky-uk%2Fclusterverse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sky-uk%2Fclusterverse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sky-uk","download_url":"https://codeload.github.com/sky-uk/clusterverse/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228204606,"owners_count":17884711,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ansible","ansible-role","automation","aws","gcp","lifecycle-management"],"created_at":"2024-12-04T23:09:01.298Z","updated_at":"2024-12-04T23:09:02.086Z","avatar_url":"https://github.com/sky-uk.png","language":"Groovy","funding_links":[],"categories":[],"sub_categories":[],"readme":"# clusterverse  \u0026nbsp; [![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause) ![PRs Welcome](https://img.shields.io/badge/PRs-Welcome-brightgreen.svg)\nA full-lifecycle, immutable cloud infrastructure cluster management **role**, using Ansible.\n+ **Multi-cloud:** clusterverse can manage cluster lifecycle in AWS and GCP\n+ **Deploy:**  You define your infrastructure as code (in Ansible yaml), and clusterverse will deploy it \n+ **Scale-up:**  If you change the cluster definitions and rerun the deploy, new nodes will be added.\n+ **Redeploy (e.g. up-version):** If you need to up-version, or replace the underlying OS, (i.e. to achieve fully immutable, zero-patching redeploys), the `redeploy.yml` playbook will replace each node in the cluster (via various redeploy schemes), and rollback if any failures occur. \n\n**clusterverse** is designed to manage base-vm infrastructure that underpins cluster-based infrastructure, for example, Couchbase, Kafka, Elasticsearch, or Cassandra.\n\n## Contributing\nContributions are welcome and encouraged.  Please see [CONTRIBUTING.md](https://github.com/sky-uk/clusterverse/blob/master/CONTRIBUTING.md) for details.\n\n## Requirements\n\n### Python dependencies\nDependencies are managed via pipenv:\n+ `pipenv install`  will create a Python virtual environment with dependencies specified in the Pipfile\n\nTo active the pipenv:\n+ `pipenv shell`\n+ or prepend the ansible-playbook commands with: `pipenv run`\n\n### AWS\n+ AWS account with IAM rights to create EC2 VMs and security groups in the chosen VPCs/subnets.  Place the credentials in:\n  + `cluster_vars[buildenv].aws_access_key:`\n  + `cluster_vars[buildenv].aws_secret_key:`\n+ Preexisting VPCs:\n  + `cluster_vars[buildenv].vpc_name: my-vpc-{{buildenv}}`\n+ Preexisting subnets. This is a prefix - the cloud availability zone will be appended to the end (e.g. `a`, `b`, `c`).\n  + `cluster_vars[buildenv].vpc_subnet_name_prefix: my-subnet-{{region}}`\n+ Preexisting keys (in AWS IAM):\n  + `cluster_vars[buildenv].key_name: my_key__id_rsa`\n\n### GCP\n+ Create a gcloud account.\n+ Create a service account in `IAM \u0026 Admin` / `Service Accounts`.  Download the json file locally.\n+ Store the contents within the `cluster_vars[buildenv].gcp_service_account_rawtext` variable. \n  + During execution, the json file will be copied locally because the Ansible GCP modules often require the file as input. \n+ Google Cloud SDK needs to be installed to run gcloud command-line (e.g. to disable delete protection) - this is handled by `pipenv install`\n\n### DNS\nDNS is optional.  If unset, no DNS names will be created.  If DNS is required, you will need a DNS zone delegated to one of the following:\n+ nsupdate (e.g. bind9)\n+ AWS Route53\n+ Google Cloud DNS\n\nCredentials to the DNS server will also be required. These are specified in the `cluster_vars` variable described below.\n\n\n### Cluster Definition Variables\nClusters are defined as code within Ansible yaml files that are imported at runtime.  Because clusters are built from scratch on the localhost, the automatic Ansible `group_vars` inclusion cannot work with anything except the special `all.yml` group (actual `groups` need to be in the inventory, which cannot exist until the cluster is built).  The `group_vars/all.yml` file is instead used to bootstrap _merge_vars_.\n\n#### merge_vars\nClusterverse is designed to be used to deploy the same clusters in multiple clouds and multiple environments, potentially using similar configurations.  In order to avoid duplicating configuration (adhering to the [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) principle), a new [action plugin](https://docs.ansible.com/ansible/latest/dev_guide/developing_plugins.html#action-plugins) has been developed (called `merge_vars`) to use in place of the standard `include_vars`, which allows users to define the variables hierarchically, and include (and potentially override) those defined before them.  This plugin is similar to `include_vars`, but when it finds dictionaries that have already been defined, it _combines_ them instead of replacing them. \n\n```yaml\n- merge_vars:\n    ignore_missing_files: True\n    from: \"{{ merge_dict_vars_list }}\"     #defined in `group_vars/all.yml`\n```\n + The variable _ignore_missing_files_ can be set such that any files or directories that are not found in the defined 'from' list will not raise an error.\n\n\u003cbr/\u003e\n\n##### merge_dict_vars_list - hierarchical:\nIn the case of a fully hierarchical set of cluster definitions where each directory is a variable, (e.g. _cloud_ (aws or gcp), _region_ (eu-west-1) and _cluster_id_ (test)), the folders may look like:  \n\n```text\n|-- aws\n|   |-- eu-west-1\n|   |   |-- sandbox\n|   |   |   |-- test\n|   |   |   |   `-- cluster_vars.yml\n|   |   |   `-- cluster_vars.yml\n|   |   `-- cluster_vars.yml\n|   `-- cluster_vars.yml\n|-- gcp\n|   |-- europe-west1\n|   |   `-- sandbox\n|   |       |-- test\n|   |       |   `-- cluster_vars.yml\n|   |       `-- cluster_vars.yml\n|   `-- cluster_vars.yml\n|-- app_vars.yml\n`-- cluster_vars.yml\n```\n\n`group_vars/all.yml` would contain `merge_dict_vars_list` with the files and directories, listed from top to bottom in the order in which they should override their predecessor:\n```yaml\nmerge_dict_vars_list:\n  - \"./cluster_defs/cluster_vars.yml\"\n  - \"./cluster_defs/app_vars.yml\"\n  - \"./cluster_defs/{{ cloud_type }}/\"\n  - \"./cluster_defs/{{ cloud_type }}/{{ region }}/\"\n  - \"./cluster_defs/{{ cloud_type }}/{{ region }}/{{ buildenv }}/\"\n  - \"./cluster_defs/{{ cloud_type }}/{{ region }}/{{ buildenv }}/{{ clusterid }}/\"\n```\n\n\u003cbr/\u003e\n\n##### merge_dict_vars_list - flat:\n\nIt is also valid to define all the variables in a single sub-directory:\n```text\ncluster_defs/\n|-- test_aws_euw1\n|   |-- app_vars.yml\n|   +-- cluster_vars.yml\n+-- test_gcp_euw1\n    |-- app_vars.yml\n    +-- cluster_vars.yml\n```\nIn this case, `merge_dict_vars_list` would be only the top-level directory (using `cluster_id` as a variable).  `merge_vars` does not recurse through directories.\n```yaml\nmerge_dict_vars_list:\n  - \"./cluster_defs/{{ clusterid }}\"\n```\n\n\u003cbr/\u003e\n\n#### /group_vars/{{cluster_id}}/*.yml:\nIf `merge_dict_vars_list` is not defined, it is still possible to put the flat variables in `/group_vars/{{cluster_id}}`, where they will be imported using the standard `include_vars` plugin.  \n\nThis functionality offers no advantages over simply defining the same cluster yaml files in the directory structure defined in `merge_dict_vars_list - flat` merge_vars technique above, and that is considered preferred. \n\n\u003cbr/\u003e\n\n### Cloud Credential Management\nCredentials can be encrypted inline in the playbooks using [ansible-vault](https://docs.ansible.com/ansible/latest/user_guide/vault.html).\n+ Because multiple environments are supported, it is recommended to use [vault-ids](https://docs.ansible.com/ansible/latest/user_guide/vault.html#managing-multiple-passwords-with-vault-ids), and have credentials per environment (e.g. to help avoid accidentally running a deploy on prod).\n+ There is a small script (`.vaultpass-client.py`) that returns a password stored in an environment variable (`VAULT_PASSWORD_BUILDENV`) to ansible. Setting this variable is mandatory within Clusterverse as if you need to decrypt sensitive data within `ansible-vault`, the password set within the variable will be used. This is particularly useful for running within Jenkins.\n  + `export VAULT_PASSWORD_BUILDENV=\u003c'dev/stage/prod' password\u003e`\n+ To encrypt sensitive information, you must ensure that your current working dir can see the script `.vaultpass-client.py` and `VAULT_PASSWORD_BUILDENV` has been set:\n  + `ansible-vault encrypt_string --vault-id=sandbox@.vaultpass-client.py --encrypt-vault-id=sandbox`\n    + An example of setting a sensitive value could be your `aws_secret_key`. When running the cmd above, a prompt will appear such as:\n    ```\n    ansible-vault encrypt_string --vault-id=sandbox@.vaultpass-client.py --encrypt-vault-id=sandbox\n    Reading plaintext input from stdin. (ctrl-d to end input)\n    ```\n    + Enter your plaintext input, then when finished press `CTRL-D` on your keyboard. Sometimes scrambled text will appear after pressing the combination such as `^D`, press the same combination again and your scrambled hash will be displayed. Copy this as a value for your string within your `cluster_vars.yml` or `app_vars.yml` files. Example below:\n    ```\n    aws_secret_key: !vault |-\n      $ANSIBLE_VAULT;1.2;AES256;sandbox\n      7669080460651349243347331538721104778691266429457726036813912140404310\n    ```\n    + Notice `!vault |-` this is compulsory in order for the hash to be successfully decrypted\n+ To decrypt, either run the playbook with the correct `VAULT_PASSWORD_BUILDENV` and just `debug: msg={{myvar}}`, or:\n  + `echo '$ANSIBLE_VAULT;1.2;AES256;sandbox`\n  `86338616...33630313034' | ansible-vault decrypt --vault-id=sandbox@.vaultpass-client.py`  \n  + **or**, to decrypt using a non-exported password:\n  + `echo '$ANSIBLE_VAULT;1.2;AES256;sandbox`\n  `86338616...33630313034' | ansible-vault decrypt --ask-vault-pass`\n\n\n---\n## Usage\n**clusterverse** is an Ansible _role_, and as such must be imported into your \\\u003cproject\\\u003e/roles directory.  There is a full-featured example in the [/EXAMPLE](https://github.com/sky-uk/clusterverse/tree/master/EXAMPLE) subdirectory.\n\nTo import the role into your project, create a [`requirements.yml`](https://github.com/sky-uk/clusterverse/blob/master/EXAMPLE/requirements.yml) file containing:\n```\nroles:\n  - name: clusterverse\n    src: https://github.com/sky-uk/clusterverse\n    version: master          ## branch, hash, or tag \n```\n+ If you use a `cluster.yml` file similar to the example found in [EXAMPLE/cluster.yml](https://github.com/sky-uk/clusterverse/blob/master/EXAMPLE/cluster.yml), clusterverse will be installed from Ansible Galaxy _automatically_ on each run of the playbook.\n\n+ To install it manually: `ansible-galaxy install -r requirements.yml -p /\u003cproject\u003e/roles/`\n\n\n### Invocation\n\n_**For full invocation examples and command-line arguments, please see the [example README.md](https://github.com/sky-uk/clusterverse/blob/master/EXAMPLE/README.md)**_\n\nThe role is designed to run in two modes:\n#### Deploy (also performs _scaling_ and _repairs_)\n+ A playbook based on the [cluster.yml example](https://github.com/sky-uk/clusterverse/tree/master/EXAMPLE/cluster.yml) will be needed.\n+ The `cluster.yml` sub-role immutably deploys a cluster from the config defined above.  If it is run again (with no changes to variables), it will do nothing.  If the cluster variables are changed (e.g. add a host), the cluster will reflect the new variables (e.g. a new host will be added to the cluster.  Note: it _will not remove_ nodes, nor, usually, will it reflect changes to disk volumes - these are limitations of the underlying cloud modules).\n\n\n#### Redeploy\n+ A playbook based on the [redeploy.yml example](https://github.com/sky-uk/clusterverse/tree/master/EXAMPLE/redeploy.yml) will be needed.\n+ The `redeploy.yml` sub-role will completely redeploy the cluster; this is useful for example to upgrade the underlying operating system version.\n+ It supports `canary` deploys.  The `canary` extra variable must be defined on the command line set to one of: `start`, `finish`, `filter`, `none` or `tidy`.\n+ It contains callback hooks:\n  + `mainclusteryml`: This is the name of the deployment playbook.  It is called to deploy nodes for the new cluster, or to rollback a failed deployment.  It should be set to the value of the primary _deploy_ playbook yml (e.g. `cluster.yml`)\n  + `predeleterole`: This is the name of a role that should be called prior to deleting VMs; it is used for example to eject nodes from a Couchbase cluster.  It takes a list of `hosts_to_remove` VMs. \n+ It supports pluggable redeployment schemes.  The following are provided:\n  + **_scheme_rmvm_rmdisk_only**\n      + This is a very basic rolling redeployment of the cluster.  \n      + _Supports redploying to bigger or smaller clusters (where **no recovery** is possible)_.\n      + **It assumes a resilient deployment (it can tolerate one node being deleted from the cluster). There is _no rollback_ in case of failure.**\n      + For each node in the cluster:\n        + Run `predeleterole`\n        + Delete/ terminate the node (note, this is _irreversible_).\n        + Run the main cluster.yml (with the same parameters as for the main playbook), which forces the missing node to be redeployed (the `cluster_suffix` remains the same).\n      + If `canary=start`, only the first node is redeployed.  If `canary=finish`, only the remaining (non-first), nodes are redeployed.  If `canary=none`, all nodes are redeployed.\n      + If `canary=filter`, you must also pass `canary_filter_regex=regex` where `regex` is a pattern that matches the hostnames of the VMs that you want to target.\n      + If the process fails at any point:\n        + No further VMs will be deleted or rebuilt - the playbook stops. \n  + **_scheme_addnewvm_rmdisk_rollback**\n      + _Supports redploying to bigger or smaller clusters_\n      + For each node in the cluster:\n        + Create a new VM\n        + Run `predeleterole` on the previous node\n        + Shut down the previous node.\n      + If `canary=start`, only the first node is redeployed.  If `canary=finish`, only the remaining (non-first), nodes are redeployed.  If `canary=none`, all nodes are redeployed.\n      + If `canary=filter`, you must also pass `canary_filter_regex=regex` where `regex` is a pattern that matches the hostnames of the VMs that you want to target.\n      + If the process fails for any reason, the old VMs are reinstated, and any new VMs that were built are stopped (rollback)\n      + To delete the old VMs, either set '-e canary_tidy_on_success=true', or call redeploy.yml with '-e canary=tidy'\n  + **_scheme_addallnew_rmdisk_rollback**\n      + _Supports redploying to bigger or smaller clusters_\n      + If `canary=start` or `canary=none`\n        + A full mirror of the cluster is deployed.\n      + If `canary=finish` or `canary=none`:\n          + `predeleterole` is called with a list of the old VMs.\n          + The old VMs are stopped.\n      + If `canary=filter`, an error message will be shown is this scheme does not support it.\n      + If the process fails for any reason, the old VMs are reinstated, and the new VMs stopped (rollback)\n      + To delete the old VMs, either set '-e canary_tidy_on_success=true', or call redeploy.yml with '-e canary=tidy'\n  + **_scheme_rmvm_keepdisk_rollback**\n      + Redeploys the nodes one by one, and moves the secondary (non-root) disks from the old to the new (note, only non-ephemeral disks can be moved).\n      + _Cluster node topology must remain identical.  More disks may be added, but none may change or be removed._\n      + **It assumes a resilient deployment (it can tolerate one node being removed from the cluster).**\n      + For each node in the cluster:\n        + Run `predeleterole`\n        + Stop the node\n        + Detach the disks from the old node\n        + Run the main cluster.yml to create a new node\n        + Attach disks to new node\n      + If `canary=start`, only the first node is redeployed.  If `canary=finish`, only the remaining (non-first), nodes are replaced.  If `canary=none`, all nodes are redeployed.\n      + If `canary=filter`, you must also pass `canary_filter_regex=regex` where `regex` is a pattern that matches the hostnames of the VMs that you want to target.\n      + If the process fails for any reason, the old VMs are reinstated (and the disks reattached to the old nodes), and the new VMs are stopped (rollback)\n      + To delete the old VMs, either set '-e canary_tidy_on_success=true', or call redeploy.yml with '-e canary=tidy'\n  + **_noredeploy_scale_in_only**\n    + A special 'not-redeploy' scheme, which scales-in a cluster without needing to redeploy every node.\n    + For each node in the cluster:\n      + Run `predeleterole` on the node\n      + Shut down the node.\n    + If `canary=start`, only the first node is shut-down.  If `canary=finish`, only the remaining (non-first), nodes are shutdown.  If `canary=none`, all nodes are shut-down.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsky-uk%2Fclusterverse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsky-uk%2Fclusterverse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsky-uk%2Fclusterverse/lists"}