{"id":25115732,"url":"https://github.com/devops-rob/vault-production-readiness-checklist","last_synced_at":"2026-01-12T02:10:07.040Z","repository":{"id":110463094,"uuid":"242476356","full_name":"devops-rob/Vault-Production-Readiness-Checklist","owner":"devops-rob","description":"This checklist will prepare you launch production-ready vault clusters into any major Cloud provider or on Premise","archived":false,"fork":false,"pushed_at":"2020-03-16T00:30:47.000Z","size":3961,"stargazers_count":23,"open_issues_count":0,"forks_count":9,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-08T02:35:29.290Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devops-rob.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-23T07:41:06.000Z","updated_at":"2023-03-08T13:29:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"368bb1ce-7bb8-4905-a743-10acad3b7940","html_url":"https://github.com/devops-rob/Vault-Production-Readiness-Checklist","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devops-rob%2FVault-Production-Readiness-Checklist","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devops-rob%2FVault-Production-Readiness-Checklist/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devops-rob%2FVault-Production-Readiness-Checklist/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devops-rob%2FVault-Production-Readiness-Checklist/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devops-rob","download_url":"https://codeload.github.com/devops-rob/Vault-Production-Readiness-Checklist/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246811183,"owners_count":20837745,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-08T02:33:51.038Z","updated_at":"2026-01-12T02:10:02.022Z","avatar_url":"https://github.com/devops-rob.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Vault Production Readiness Checklist\nThis checklist will prepare you launch production-ready vault clusters into any major Cloud provider or on Premise\n\n1. [Infrastructure Architecture](#Infrastructure-Architecture)\n1. [Consul Storage Backend](#Consul-Storage-Backend)\n1. [Load Balancing](#Load-Balancing)\n1. [Monitoring and Alerting](#Monitoring-and-Alerting)\n1. [Configuration Management](#Configuration-Management)\n1. [Vault Configuration](#Vault-Configuration)\n1. [Identity \u0026 Access Management](#Identity-\u0026-Access-Management)\n1. [Security Hardening](#Security-Hardening)\n1. [Operational Readiness](#Operational-Readiness)\n1. [Observability](#Disaster-Recovery)\n1. [Governance and compliance](#Governance-and-compliance)\n\n\n### **Infrastructure Architecture**\n\n|  |  |\n| --------- | ------- |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eInfrastructure Workloads spread accross Availability Zones\u003c/summary\u003e \u003cp\u003e Nodes in Vault clusters (and Consul clusters if being used as a storage backend) should be spread accross two or more failure domains known as Availability zones. The loss of a single Availability zone should not result result in a loss of service. \u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eMachine Images Created (Virtual Machines only)\u003c/summary\u003e \u003cp\u003e If you are deploying your Vault nodes on virtual machines, It is reccomended to build re-usable VM images that can be used to create cluster nodes in an immutable way.  Tools like [Hashicorp Packer](https://packer.io/) are designed to help build repeatable machine images for most virtualised and cloud platform. Machine images should be versioned and should follow a release cycle as new images are produced.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eReplication Enabled (Enterprise only)\u003c/summary\u003e \u003cp\u003e If you are using the Enterprise version of Vault, you can enable replication between two or more Vault clusters in different geographical regions for added protection is Disaster Recovery scenarios.  Replication can be configured in Disaster Recovery Mode or Performance Replication mode.  If you are planing on using Replication, you need to provision infrastructure in an alternative region, with nodes spread accross multiple Availability Zones. For more information about the Enterprise Replication feature, see [the official documentation.](https://www.vaultproject.io/docs/internals/replication/) \u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eFirewall rules configured to control access to Vault\u003c/summary\u003e \u003cp\u003e Vault will likely contain business critical secrets which makes it a prime target for malicious actors. Access to vault to should be restricted to your private networks and not be accessible on the internet.  The Use of Virtual Private Networks is a commonly used approach to allow access to Vault from unknown networks\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eCompute Resouces satisfy the minimum requirements\u003c/summary\u003e \u003cp\u003e Ensure Hardware servers and Virtual Machines have been appropriately resources in accordance with the [Deployment System Requirements](https://learn.hashicorp.com/vault/operations/ops-reference-architecture#deployment-system-requirements) \u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eSecondary Disk\u003c/summary\u003e \u003cp\u003eEnsure that vault servers have a secondary disk attached to them. This will help with Audit Device Fault tolerance\u003c/p\u003e \u003c/details\u003e |\n\n### **Consul Storage Backend**\n\n|  |  |\n| --------- | ------- |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eStorage Backend Architecture\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eIt is a recommended pattern to use [HashiCorp Consul's](https://www.consul.io/) Key/Value store as the storage backend. The reccomended cluster size for consul is 5 nodes.  This cluster size allows for fault tolerance whilst performing maintenence on a a node\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eConsul Gossip Encryption configured\u003c/summary\u003e \u003cp\u003e Members of the Consul clusters use a gossip protocol to communicate with eachother and hold leadership elections. This network traffic should be encrypted to minimise security risks.  You can read more about consul encryption [here.](https://www.consul.io/docs/agent/encryption.html) \u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eACLs\u003c/summary\u003e \u003cp\u003eThe path that Vault uses in Consul's key/value storage to store it's encrypted data should be protected using Consul's ACL system. Once configured, the Management ACL token should be revoked.  You can read more about configuring Consul's ACL system [here.](https://www.consul.io/docs/acl/index.html)  \u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eConsul Backups scheduled\u003c/summary\u003e \u003cp\u003eAs Consul is being used as a data store that Vault uses, it should be considered a stateful service, and as such, should have a backup strategy.  Consul snapshot, in addition to disk backups should be implemented on a regular schedule. For more information about consul snapshot, click [here](https://www.consul.io/docs/commands/snapshot.html)\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eConsul Clients installed on Vault Nodes\u003c/summary\u003e \u003cp\u003eVault should not talk directly to Consul backend as this introduces an increased attack vector.  Instead, Consul should be installed on the Vault servers and configured in client mode. The clients will facilitate the communication between Vault and Consul.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eConsul UI enabled (Optional)\u003c/summary\u003e \u003cp\u003eIf using a 5 node consul cluster, you can choose to enable the UI; however, it is recommended that the UI is enabled on two nodes only.\u003c/p\u003e \u003c/details\u003e |\n\n### **Load Balancing**\n\n|  |  |\n| --------- | ------- |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eTLS Encryption is configured on Load Balancer\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003e Vault’s communications should be encrypted end-to-end with TLS and this should not be terminated at the Load balancer layer. The load balancer should also use the same encryption to communicate with Vault\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eHTTP Redirects configured\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003e With TLS configured, all traffic going via HTTPS will be encrypted; however, we need to ensure that there are no connections to vault via HTTP. The Load balancer should be configured to redirect all HTTP traffic to HTTPS.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eHealth checks enabled and configured\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eLoad balancer health probes can be used to ensure that traffic is only routed to a healthy leader node. Configure routing rules according to [these response codes](https://www.vaultproject.io/api/system/health.html) \u003c/p\u003e \u003c/details\u003e |\n\n\n### **Monitoring \u0026 Alerting**\n\n|  |  |\n| --------- | ------- |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eVault Telemetry Enabled.\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eVault telemetry should be configured in the telemetry stanza within the Vault config file. This will enable monitoring and alerting with a wide range of open source tools (Telegraf and prometheus)\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eConsul Telemetry Enabled.\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eConsul telemetry should be configured in the telemetry stanza within the consul config file. This will enable monitoring and alerting with a wide range of open source tools (Telegraf and prometheus)\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eVault platform monitoring configured\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eMonitoring system of your choice is configured to monitor and alert on vault application metric thresholds as per the [best practice guidance of Hashicorp.](https://s3-us-west-2.amazonaws.com/hashicorp-education/whitepapers/Vault/Vault-Consul-Monitoring-Guide.pdf)\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eConsul platform Monitoring Configured\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eMonitoring system of your choice is configured to monitor and alert on infrastructure metric thresholds as per the [best practice guidance of Hashicorp.](https://s3-us-west-2.amazonaws.com/hashicorp-education/whitepapers/Vault/Vault-Consul-Monitoring-Guide.pdf)\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eInfrastructure Monitoring Configured\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eMonitoring system of your choice is configured to monitor and alert on consul application metric thresholds as per the [best practice guidance of Hashicorp.](https://s3-us-west-2.amazonaws.com/hashicorp-education/whitepapers/Vault/Vault-Consul-Monitoring-Guide.pdf)\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eMonitoring Dashbaord Created\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eUsing a Dashboard tool a of your choice, create a monitoring dashboard for operations staff to easily identify any issues that may be occurring.\u003c/p\u003e \u003c/details\u003e |\n\n\n### **Configuration Management**\n\n|  |  |\n| --------- | ------- |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eInfrastructure as code written (Virtual Machines only)\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003e Code written to deploy the infrastructure for Consul and Vault. [Terrafrom](https://www.terraform.io/) is an appropriate tool for this task.  Virtual Machine images created from code for Consul and Vault. Packer is a good choice of tool for this. All Virtual infrastructure should be deployed and managed using an Infrastructure as code tool\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eVault Platform Configuration Code Written\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003e Vault Platform configuration should be described in code using a tool like [Terrafrom](https://www.terraform.io/).  Configuration such as Auth Methods, Secrets Engines, Audit Devices and Policies should all be configured using code\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eCode under Version Control in Source Code Repositories\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eAll Infrastructure code and application code should be stored separate source control repositories and be placed under version control. An appropriate branching strategy should be implemented and documented in the README file.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eCode Owners in repositories\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eRepository files should have code owners assigned to them to control who can approve Pull Requests that will be merged into the Master branch.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eRepository rules implemented\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eConfigure the minimum number of Pull Request approvers, restrictions on Pull Request Authors approving their own requests and any other rules that your organisation’s security standards require for Integrity.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eDeployment pipelines implemented\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eCode deployments should be automated using deployment pipelines. Where possible, the pipeline should be written as code and stored under version control with the code\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eDev/Staging environments created\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eCreate development and staging environments for Vault.  Staging Environment should be identical to production, with the only divergence being, when pre-production changes are implemented for final testing prior to being deployed to production.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eNaming and coding standards established\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eImplement and document naming and coding standards. Naming standards for Namespaces, Policies, Vault Roles, secrets keys and AppRoles.  Coding standards where applicable for variable names and function names.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eIntegration tests written\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eA suite of automated integration tests written to be run either during the deployment pipeline or as a pre-check on your chosen VCS required to pass before a Pull Request can be merged.\u003c/p\u003e \u003c/details\u003e |\n\n\n### **Vault Configuration**\n\n|  |  |\n| --------- | ------- |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eEnable audit device on 2 files on different disks\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eVault logs all requests and responses to requests. If Vault is unable to log requests and responses to these requests, it will immediately seize operations. To provide redundancy, each vault node should have 2 file audit devices enabled on separate volumes on separate disks.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eLog rotation configured on log files\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eEnable and configure log rotation on the audit files to ensure the disks do not fill up and cause a vault outage.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eConfigure at least one IDP as an auth method\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eWhere appropriate, configure an existing identity provider (or multiple if required) as an authentication method in Vault\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eConfigure the required secrets engines\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eIdentify and enable the required secrets engines for your business and technical use cases\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eEnsure KV v2 is enabled\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eEnsure that Version 2 of the KV secrets engine is used to enable secrets versioning\u003c/p\u003e \u003c/details\u003e |\n\n\n### **Identity \u0026 Access Management**\n\n|  |  |\n| --------- | ------- |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eCreate default policies\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eCreate default policies that all user entities will inherit according to your business security model.  This could be list permissions on a particular KV path for example.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eCreate Policy mappings for default policies\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eCreate a mapping for default policies to ensure all user entities inherit these policies.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eConfigure aliases for entities when more than one auth method is in use\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eUsing the Identity Secrets engine, create aliases to attach vault logins via different auth methods to a single entity to ensure the correct policies are inherited and to make the logging data easier to mine\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eDesign a path structure for KV v2 that matches the way your org works (team based or product/service based)\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eMap you KV path design to the way your organisation works or product groupings.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eMeta AppRole process defined\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eMeta Approles are a mechanism that allow an application or service to read the secret id of an app role without exposing this to application developers.\u003c/p\u003e \u003c/details\u003e |\n\n\n### **Security Hardening**\n\n|  |  |\n| --------- | ------- |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eEnsure TLS is configured on Vault Cluster\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eEnable end-to-end encryption using TLS certificates.  Vault agents should also use TLS certificates\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eEnsure TLS is configured on Consul Clusters\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eEnable end-to-end encryption on consul cluster and agent. More information can be found [here.](https://www.consul.io/docs/agent/encryption.html)\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eEnable and configure SELinux / App Armour\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eEnable and config SELinux / app amour depending on your operating system to create sandboxed contexts to  reduce blast radius if even the system is compromised.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eRandomise the ports used to differ from standard ports for Vault\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eBy default, Vault uses port 8200 and 8201. Change the port to a non-standard port to provide extra hardening\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eRandomise the ports used to differ from standard ports for Consul\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eBy default, Consul uses port 8500 and 8501. Change the port to a non-standard port to provide extra hardening\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eRevoke root token\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eOnce initial set-up of Vault cluster has been completed, the root token should be revoked.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eRevoke Consul Management ACL token\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eOnce the initial set-up of the Consul ACL system has been completed, the management token should be revoked.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eConfigure server firewalls to only allow access to required ports.\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eUsing firewalld or IP Tables, configure these firewalls to limit port access to the vault and consul servers.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eDisable SSH\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eInteraction with Vault is done via the API, even when using the CLI.  As such, there is no reason to have to SSH on to a vault server (if it’s a virtual machine) so SSH should be disabled to mitigate the risk of unauthorised access to the server.\u003c/p\u003e \u003c/details\u003e |\n\n\n### **Operational Readiness**\n\n|  |  |\n| --------- | ------- |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eConfigure auto-unseal\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eAdd a seal stanza to the Vault config file to reduce operational burden on operators. For more information check the [auto-unseal documentation here](https://www.vaultproject.io/docs/concepts/seal/#auto-unseal)\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003ePGP encryption of unseal/recovery keys\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eUse PGP or Keybase to add an extra layer of security to the distribution of unseal/recovery keys. For more details, see the [official documentation here](https://www.vaultproject.io/docs/concepts/pgp-gpg-keybase/)\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eBackup / restore practice run\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003ePractice restoring your Vault platform from a Consul snapshot.  Your backup strategy isn’t complete until you have tested this.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eNode rebuild practice run\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003ePractice building and replacing a node in the vault and consul clusters with zero downtime.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eVault upgrade practice run\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003ePractice upgrading Vault binaries to newer versions with zero downtime.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eConsul upgrade practice run\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003ePractice upgrading Consul binaries to newer versions with zero downtime.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eLoad testing\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eConsuct load testing to ensure your infrastructure compute resources are sufficient for the load you are expecting. There are projects like [wrk](https://github.com/wg/wrk) That can assist with generating traffic.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eDocument key holders and contact details\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eEnsure unseal/recovery key holders are documented on a Wiki and this document is kept up-to-date\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eTrusted Broker/Platform pattern\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eChoose a platform or broker that your business trusts and use this for secure injection of initial secrets. Examples are using Azure as a trusted platform or using Jenkins as a trusted broker.  Each organisation will differ with regards to what they trust so this should be a business driven decision.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eOS Patching strategy\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eDocument an implement an OS patching strategy, whether it’s updating VM images and replacing VMs with up-to-date images or whether its a controlled direct access update by an operator.\u003c/p\u003e \u003c/details\u003e |\n\n\n### **Observability**\n\n|  |  |\n| --------- | ------- |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eLogs shipping to central logs data warehouse\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eLogs should be streamed to a central data warehouse as log rotation on the servers should be enabled and logs will be lost locally. A platform like splunk  is ideal for this use case.  There are other viable options available.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eLogs data mining scripts written.\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eDecide the value that the log data should provide and write some scripts to extract this value from the data. Scripts can be written in python.  Models can also be produced to predict future loads based on existing data sets.  This kind of insight can be useful for planning.\u003c/p\u003e \u003c/details\u003e |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eLogs alerting configured\u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eSome events should generate some kind of alert, for example, a root token being generated should be flagged and alerted on. Ensure these events have alerts configured for them.\u003c/p\u003e \u003c/details\u003e |\n\n### **Governance and compliance**\n\n|  |  |\n| --------- | ------- |\n| \u0026#9744;   | \u003cdetails\u003e\u003csummary\u003eThreat Model Exercise \u003c/summary\u003e \u003cp\u003e \u003c/summary\u003e \u003cp\u003eConduct a threat modelling exercise using a framework of your organisations choosing and ensure you have documented and mitigated against all identified threats.\u003c/p\u003e \u003c/details\u003e |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevops-rob%2Fvault-production-readiness-checklist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevops-rob%2Fvault-production-readiness-checklist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevops-rob%2Fvault-production-readiness-checklist/lists"}