{"id":19259231,"url":"https://github.com/concourse/infrastructure","last_synced_at":"2025-02-23T18:17:40.949Z","repository":{"id":48149969,"uuid":"259370825","full_name":"concourse/infrastructure","owner":"concourse","description":"Automation stack for the Concourse project's infrastructure.","archived":false,"fork":false,"pushed_at":"2024-10-22T18:03:21.000Z","size":747,"stargazers_count":8,"open_issues_count":8,"forks_count":7,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-01-05T09:30:20.885Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"HCL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/concourse.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-27T15:20:40.000Z","updated_at":"2024-10-22T18:03:26.000Z","dependencies_parsed_at":"2023-10-27T03:34:34.268Z","dependency_job_id":"ebf713b9-5a18-47e7-a466-006290ae9de0","html_url":"https://github.com/concourse/infrastructure","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/concourse%2Finfrastructure","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/concourse%2Finfrastructure/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/concourse%2Finfrastructure/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/concourse%2Finfrastructure/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/concourse","download_url":"https://codeload.github.com/concourse/infrastructure/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240356193,"owners_count":19788513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T19:15:48.518Z","updated_at":"2025-02-23T18:17:40.910Z","avatar_url":"https://github.com/concourse.png","language":"HCL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# greenpeace\n\nautomate everything\n\n## deploying from zero\n\n### requirements\n\n```sh\n$ gcloud version\nGoogle Cloud SDK 291.0.0\nbq 2.0.57\ncore 2020.05.01\ngsutil 4.50\n\n$ terraform version\nTerraform v0.14.7\n\n$ jq --version\njq-1.6\n\n$ ytt --version\nytt version 0.31.0\n```\n\n...but you can probably get away with slightly different versions\n\n### bootstrapping\n\nFirst, authenticate your account with `gcloud auth`:\n\n```sh\ngcloud auth application-default login\n```\n\nThis will save your GKE credentials in a JSON file under `~/.config/gcloud`,\nwhich the `terraform` CLI will automatically use.\n\nNext, run:\n\n```sh\n./bootstrap/setup\n```\n\nThis will create the following:\n\n1. The `concourse-greenpeace` bucket, which will store the Terraform state\n   for all of our deployments.\n\n1. A `greenpeace-terraform` service account, which has permissions to write\n   to the bucket and perform various operations within the GCP project.\n\nAfter these have been created, it will revoke any existing keys for the\nservice account and generate a new one, placing it under `sensitive/`.\n\nThe script will then fetch the `concourse_bot_private_key` secret from GCP\nSecret Manager. This secret is necessary for pulling the Greenpeace repo in\nthe bootstrapping pipeline, since this repo is private.\n\nIt will also prompt you to ensure that the required secrets have been added\nto GCP Secret Manager. The following secrets must be created:\n\n* `production-ci-github_client_id` - the client ID of the Github application\n  for authenticating with the CI concourse deployment\n* `production-ci-github_client_secret` - the client ID of the Github\n  application for authenticating with the CI concourse deployment\n* `dispatcher-concourse-github_client_id` - the client ID of the Github\n  application for authenticating with the concourse deployment in the\n  dispatcher cluster\n* `dispatcher-concourse-github_client_secret` - the client secret of the Github\n  application for authenticating with the concourse deployment in the\n  dispatcher cluster\n\nNote: after all this is done, the `bootstrap/terraform.tfstate` file needs to\nbe checked in. (Be careful not to have any credentials as outputs.)\n\n#### bootstrap credentials\n\nIn order for the deployments to operate, they require their vault instance to\nbe populated with credentials. Each environment has its own set of credentials\nthat can diverge, but `production`'s is the source of truth for new clusters.\n\nEach environment stores its vault data in the [greenpeace bucket] -\nspecifically, at the path `vault/ENVIRONMENT/data.tar`.\n\nFor the production cluster, this can be first created by exporting from an\nexisting vault instance by running `./scripts/export-secrets` (see [managing\nsecrets]).\n\nFor other environments, this can be created by syncing with the production\ncluster by triggering the job `sync-secrets-with-production` in the pipeline.\n\n### deploy the dispatcher\n\nThe next step is to deploy the `dispatcher` cluster. This cluster is solely\nresponsible for continuously deploying the `production` cluster.\n\n`dispatcher` is deployed through a Concourse pipeline. If this is the first\ndeploy, you should run this pipeline on a local Concourse:\n\n```sh\nytt -v cluster=dispatcher -f pipelines/greenpeace.yml -f pipelines/data.yml | \\\n  fly -t dev set-pipeline \\\n    -l sensitive/vars.yml \\\n    -p dispatcher-greenpeace \\\n    -c -\n```\n\nAfter the `production` cluster is up, the pipeline can be run from CI to update\nConcourse on the `dispatcher`.\n\n`dispatcher`'s Concourse can be accessed at [`dispatcher.concourse-ci.org`](https://dispatcher.concourse-ci.org/).\n\nAfter deploying the `dispatcher`, you should set the reconfigure pipeline:\n\n```sh\nfly -t dispatcher sp -p reconfigure -c pipelines/reconfigure.yml\n```\n\nThis will configure the `production` pipeline, which will trigger automatically\nand create the production cluster. Once the `terraform` job completes, you should set the reconfigure pipeline on CI:\n\n```sh\nfly -t ci sp -p reconfigure-pipelines -c ~/workspace/ci/pipelines/reconfigure.yml\n```\n\nThis will bootstrap the initial pipelines and teams. Note that there may be a\nrace condition between creating the initial pipelines and teams - if the\nreconfigure jobs error, they should pass on a rerun.\n\n### restoring the CI db\n\nA script can be manually run to restore the old CI DB from a backup. The DB instance ID and backup\nID can be found by running:\n\n```sh\n$ gcloud sql instances list\n$ gcloud sql backups list -i [instance_id]\n```\n\nThe old encryption key can be retrieved by getting the helm values, inspecting the pod and the env\nvars, or exec'ing onto the pod and grabbing it from there.\n\n```sh\n$ ./scripts/restore-db-backup [src_instance_id] [backup_id] [old_encryption_key]`\n```\n\nIf DB was restored but the re-encryption part fails, it can be retried by running:\n\n```sh\n$ ./scripts/reencrypt-db [old_encryption_key]\n```\n\nNote that the disk capacity of the new DB must be as large as the disk capacity of the old DB.\nhttps://cloud.google.com/sql/docs/postgres/backup-recovery/restore#tips-restore-different-instance\n\n### managing secrets\n\nThe source-of-truth of secrets for new clusters is stored in\n`gs://concourse-greenpeace/vault/production/data.tar`. When an environment is\ndeployed, this data (containing all of the secrets in vault) is imported into\nthe new vault.\n\nThere are some helper scripts for managing secrets:\n\n* `scripts/connect-to-vault` gives you a shell on the vault pod for an\n  environment\n  * Once you have a shell, you can run `vault read ...` and `vault write ...`\n  * You must provide the environment name you want to connect to, e.g.\n    `./scripts/connect-to-vault production`\n* `scripts/export-secrets` exports secrets under `concourse/` from the vault\n  instance at `$VAULT_ADDR` and generates an encrypted bundle that can be\n  uploaded to GCS\n  * This can be used to generate the bundle initially\n  * You need to set `VAULT_ADDR` and `VAULT_TOKEN`, but will also probably need\n    to set `VAULT_SKIP_VERIFY=1`\n  * You'll also need to port-forward the vault instance via `kubectl\n    port-forward -n vault svc/vault 8200:8200` (if you're exporting from one of\nour vaults)\n  * The command will generate a `gsutil` command to upload the encrypted bundle\n    to `gs://concourse-greenpeace/vault/production/data.tar`\n\n#### encryption\n\nThere are two encryption keys used to encrypt the vault data:\n\n1. A randomly generated 32 byte sequence for encrypting the data, and\n2. A Google KMS crypto key for encrypting aforementioned key\n\nThis is because KMS crypto keys can only encode small payloads.\n\n#### vault ca cert expires\n\nWhen the vault ca cert expires, it is automatically re-created through the terraform job for that environment. For example, if the dispatcher vault ca cert expires, it would be through [terraform job in the dispatcher pipeline](https://ci.concourse-ci.org/teams/main/pipelines/dispatcher-greenpeace/jobs/terraform). Once this terraform job has deleted the old ca cert, created a new one and run successfully, you might have to restart the vault pod manually using `kubectl delete pod -n \u003cnamespace\u003e vault-0`.\n\nIn strange cases, the terraform job won't actually recreate the expired vault CA certs though.  i, that is, clarp tincan, haven't figured out what causes it, but was able to develop a procedure to fix it at the very least.  You might not ever need these steps, but hopefully they'll save you a LOT of time and turmoil if you ever do.  The basic goal is to use [the new `-replace` flag in Terraform 1.5](https://developer.hashicorp.com/terraform/cli/commands/plan#replace-address), though it takes a bit of work to get there:\n* Install terraform 1.5+ on your local machine.  i did this on v1.5.4.\n* `cd` to whichever `terraform/environments/[deployment]` directory has the expired vault CA cert.\n* Delete the `.terraform.lock.hcl` file and `.terraform/` folder, if present.\n* Run the following to end up with a 1.5.x-syntaxed version of the above files/folders:\n    * `terraform init`\n    * `gcloud container clusters get-credentials [deployment]`\n    * `terraform workspace select [deployment]`\n* Rename the `variables.yml` file to `variables.tfvars`, and make the following syntax changes to make it 1.5 compatible:\n    * Change all the \":\" separators to \"=\".\n    * Put double quotes around all the argument values.\n* Go to the CF-Concourse-Production GCP project, and make an IAM user with the following roles: Editor, Secret Manager Secret Accessor.  Save its json secrets locally, because we'll need it in a second.\n* Run this command to (hopefully) force terraform to rotate the cert and all the certs that derive from it.  You'll know if it worked because it'll say something like \"9 replaced\" and show the changes to the CA cert and its derivatives.  \n    * `terraform apply -var 'concourse_chart_version=17.1.1' -var 'vault_root_ca_validity_period=87600' -var credentials=\"$(cat [path_to_iam_user_json_secrets_file])\" -var-file='variables.tfvars' -replace='module.vault.tls_self_signed_cert.ca'`\n* To get the vault container to pick up the new cert, you'll next have to delete the vault pods currently running in Kubernetes as outlined at the beginning of this section.  When they get remade, they should be healthy.\n* If both production and dispatcher had expired CA certs, you've been in a deadlock situation this whole time where neither CI system can access either Vault.  This makes it impossible to use CI to replace the old certs.  If you're in this state, you'll have to break the deadlock yourself by manually replacing each cert.  Do this procedure twice, once for dispatcher and once for production:\n    * Get the deployment's Vault root key from the CF-Concourse-Production GCP project:\n        ```bash\n        gsutil cat \"gs://concourse-greenpeace/vault/[DEPLOYMENT]/root-token.enc\" | \\\n          base64 --decode | \\\n            gcloud kms decrypt \\\n              --key \"projects/cf-concourse-production/locations/global/keyRings/greenpeace-kr/cryptoKeys/greenpeace-key\" \\\n              --ciphertext-file - \\\n              --plaintext-file -\n        ```\n    * SSH into the Vault K8s container: `scripts/connect-to-vault [deployment]`\n    * Log into the Vault using the decrypted Vault root token: `vault login token=[root_token]`\n    * Get the expired root CA definition: `vault read auth/cert/certs/concourse`\n    * As a safety-check, manually copy-paste that root CA into a text file and open it up.  Use the issue and expiration fields to confirm this is indeed the failing cert.\n        * `openssl x509 -noout -text -in '[cert_file_path]'`\n        * Keep this file until you've successfully rotated certs, just to be on the safe side.\n    * Because you recreated the Vault K8s container earlier, it contains a local copy of the new cert.  Use the environment variable `VAULT_CACERT` to find its location.  Open it up using the same openssl command and validate that the issue / expiry dates look correct.\n        * Just in case the environment isn't working, as of time of writing (10/17/24): `VAULT_CACERT=/vault/userconfig/vault-server-tls/vault.ca`\n    * Now that you've validated the new cert, write it to Vault with this command: `vault write auth/cert/certs/concourse \"policies=concourse\" \"certificate=$(cat $VAULT_CACERT)\"`\n    * Wait a couple of minutes for the changes to propagate. Resource checks in CI should now succeed.\n    * Make sure to repeat these steps with the other deployment.\n* Then finally, run the `initialize-vault` job from the opposite-deployment you updated.  So if you're fixing production, you'd run the job from dispatcher and vice versa.\n\nIf this didn't work, consider crying.\n\nIf crying didn't work, consider weeping profusely.\n\nIf the problem still persists, welp.\n\nAlso worth noting, here are some things that didn't work for me:\n* Just deleting the old, expired cert from GCP and running the terraform-dispatcher job from the opposite deployment's ci. i forget what happened, but i think the job just complains that the file doesn't exist and fails.\n* Downloading the deployment's tfstate (gcp://CF-Concourse-Production/concourse-greenpeace/terraform/[deployment].tfstate), deleting the CA cert by hand, and reuploading it. Terraform just somehow restores the old cert and keeps using it - it doesn't trigger it to get recreated. i tried like every combination of deleting entire fields or just deleting values in the entire file, and nothing worked.\n* Simply deleting and recreating K8s containers isn't enough to get the new certs into Vault itself.  Your only options are to access Vault directly and manually replace the certs, or delete the entire Vault instance, recreate it from scratch, restore its contents using backups from GCP, and hope you haven't lost anything important.  \n\n\n\n\n[greenpeace bucket]: https://console.cloud.google.com/storage/browser/concourse-greenpeace\n[managing secrets]: #managing-secrets\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconcourse%2Finfrastructure","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fconcourse%2Finfrastructure","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconcourse%2Finfrastructure/lists"}