{"id":16442285,"url":"https://github.com/sungchun12/airflow-toolkit","last_synced_at":"2025-08-01T06:07:31.401Z","repository":{"id":49058593,"uuid":"275052988","full_name":"sungchun12/airflow-toolkit","owner":"sungchun12","description":"Any Airflow project day 1, you can spin up a local desktop Kubernetes Airflow environment AND one in Google Cloud Composer with tested data pipelines(DAGs) :desktop_computer:   \u003e\u003e [ :rocket:, :ship:  ]","archived":false,"fork":false,"pushed_at":"2023-09-21T20:39:05.000Z","size":12939,"stargazers_count":113,"open_issues_count":2,"forks_count":31,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-07-15T22:03:10.867Z","etag":null,"topics":["actions","airflow","airflow-environments","airflow-toolkit","cloud","cloud-composer","composer","dbt","docker","gcp","google-cloud","hcl","kubernetes","kubernetes-deployment","python","python3","shell-script","terraform","terragrunt","terragrunt-deployment"],"latest_commit_sha":null,"homepage":"","language":"HCL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sungchun12.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-06-26T01:55:52.000Z","updated_at":"2025-07-08T01:05:10.000Z","dependencies_parsed_at":"2025-07-15T11:31:13.155Z","dependency_job_id":"bcec4936-1102-46ec-8a37-fb956d8670c0","html_url":"https://github.com/sungchun12/airflow-toolkit","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/sungchun12/airflow-toolkit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sungchun12%2Fairflow-toolkit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sungchun12%2Fairflow-toolkit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sungchun12%2Fairflow-toolkit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sungchun12%2Fairflow-toolkit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sungchun12","download_url":"https://codeload.github.com/sungchun12/airflow-toolkit/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sungchun12%2Fairflow-toolkit/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268177814,"owners_count":24208396,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-01T02:00:08.611Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actions","airflow","airflow-environments","airflow-toolkit","cloud","cloud-composer","composer","dbt","docker","gcp","google-cloud","hcl","kubernetes","kubernetes-deployment","python","python3","shell-script","terraform","terragrunt","terragrunt-deployment"],"created_at":"2024-10-11T09:16:53.142Z","updated_at":"2025-08-01T06:07:31.335Z","avatar_url":"https://github.com/sungchun12.png","language":"HCL","funding_links":[],"categories":[],"sub_categories":[],"readme":"![repo_log](/docs/repo_logo.png)\n\n\u003e `SAME AIRFLOW DATA PIPELINES | WHEREVER YOU RUN THEM`\n\n# airflow-toolkit :rocket:\n\n![Terragrunt Deployment-Validate Syntax and Plan](https://github.com/sungchun12/airflow-toolkit/workflows/Terragrunt%20Deployment-Validate%20Syntax%20and%20Plan/badge.svg) ![Checkov-Terraform Security Checks](https://github.com/sungchun12/airflow-toolkit/workflows/Checkov-Terraform%20Security%20Checks/badge.svg)\n\nAny Airflow project day 1, you can spin up a local desktop Kubernetes Airflow environment AND a Google Cloud Composer Airflow environment with working example DAGs across both :sparkles:\n\n## Motivations\n\nIt is a painful exercise to setup secure airflow environments with parity(local desktop, dev, qa, prod). Too often, I've done all this work in my local desktop airflow environment only to find out the DAGs don't work in a Kubernetes deployment or vice versa. As I got more hands-on with infrastructure/networking, I was performing two jobs: Data and DevOps engineer. Responsibilities overlap and both roles are traditionally ill-equipped to come to consensus. Either the networking specifics go over the Data engineer's head and/or the data pipeline IAM permissions and DAG idempotency go over the DevOps engineer's head. There's also the issue of security and DevOps saying that spinning up an airflow-dev-cloud-environment is too risky without several development cycles to setup bastion hosts, subnets, private IPs, etc. These conversations alone can lead to several-weeks delays before you can even START DRAFTING airflow pipelines! It doesn't have to be this way.\n\n**This toolkit is for BOTH Data and DevOps engineers to solve the problems above** :astonished:\n\n**High-Level Success Criteria:**\n\n- Deploy airflow in 4 environments(local desktop, dev, qa, prod) in ONE day with this repo(save 4-5 weeks of development time)\n- Confidence that base DAG integration components work based on successful example DAGs(save 1-2 weeks of development time)\n- Simple setup and teardown for all toolkits and environments\n- It FEELS less painful to iteratively develop airflow DAG code AND infrastructure as code\n- Secure and private environments by default\n- You are inspired to automate other painful parts of setting up airflow environments for others\n\n## Use Cases\n\n**In Scope**\n\n- Airflow works on local computer with docker desktop installed\n- Run meaningful example DAGs with passing unit and integration tests\n- Easily setup and teardown any environment\n- Sync DAGs real-time with local git repo directory(local desktop)\n- Reusable configs in another kubernetes cluster\n- Same DAGs work in both local desktop computer and Google Cloud Composer\n- Secure cloud infrastructure and network(least privliges access and minimal public internet touchpoints)\n- Low cost/Free(assumes you have Google Cloud free-trial credits)\n\n**Out of Scope**\n\n- Written end to end DAG tests as the dry runs are sufficient for this repo's scope\n- `terratest` for terraform unit testing\n- DAGs relying on external data sources outside this repo\n- Hyper-specific IAM permission mechanics as it depends on your specific situation\n\n## Table of Contents\n\n- Pre-requisites\n- Toolkit #1: Local Desktop Kubernetes Airflow Deployment\n- Toolkit #2: Terragrunt-Driven Terraform Deployment to Google Cloud\n- Toolkit #3: Simple Terraform Deployment to Google Cloud\n- Post-Deployment Instructions for Toolkits #2 \u0026 #3\n- Git Repo Folder Structure\n- Frequently Asked Questions(FAQ)\n- Resources\n\n---\n\n---\n\n## Prerequisites\n\n\u003e Time to Complete: 5-10 minutes\n\n1. [Sign up for a free trial](https://cloud.google.com/free/?hl=ar) _OR_ use an existing GCP account\n2. Manually FORK the repo through the github interface _OR_ CLONE this repo: `git clone https://github.com/sungchun12/airflow-toolkit.git`\n\n   ![fork git repo](/docs/fork-git-repo.png)\n\n3. Create a new Google Cloud project\n\n   ![create gcp project](/docs/create-gcp-project.gif)\n\n4. Get into starting position for deployment: `cd airflow-toolkit/`\n\n### One Time Setup for All Toolkits\n\n\u003e Time to Complete: 10-15 minutes\n\n- [Download docker desktop](https://www.docker.com/products/docker-desktop) and start docker desktop\n\n- Customize Docker Desktop for the below settings\n- Click `Apply \u0026 Restart` where appropriate\n  ![custom_resources](/docs/custom_resources.png)\n  ![enable_kubernetes](/docs/enable_kubernetes.png)\n\n- Run the below commands in your terminal\n\n```bash\n# install homebrew\n/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)\"\n\n# install helm, terragrunt, terraform, kubectl to local desktop\nbrew install helm terragrunt terraform kubectl\n\n# Install Google Cloud SDK and follow the prompts\n# https://cloud.google.com/sdk/install\ncurl https://sdk.cloud.google.com | bash\n```\n\n- Close the current terminal and start a new one for the above changes to take effect\n\n- [Create a Service Account](https://cloud.google.com/iam/docs/creating-managing-service-accounts#creating)\n\n- Add the `Editor`, `Secret Manager Admin`, `Role Administrator`, and `Security Admin` roles\n\n  \u003e Note: this provides wide permissions for the purposes of this demo, this will need to be updated based on your specific situation\n\n- [Enable the Service Account](https://cloud.google.com/iam/docs/creating-managing-service-accounts#enabling)\n\n- [Create a Service Account Key JSON File-should automatically download](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys)\n\n- Move private `JSON` key into the root directory of this git repo you just cloned and rename it `account.json`(don't worry it will be officially `gitignored`)\n\n- Run the below commands in your local desktop terminal\n\n```bash\n# Authenticate gcloud commands with service account key file\ngcloud auth activate-service-account --key-file account.json\n\n# Enable in scope Google Cloud APIs\ngcloud services enable \\\n    compute.googleapis.com \\\n    iam.googleapis.com \\\n    cloudresourcemanager.googleapis.com \\\n    bigquery.googleapis.com \\\n    storage-component.googleapis.com \\\n    storage.googleapis.com \\\n    container.googleapis.com \\\n    containerregistry.googleapis.com \\\n    composer.googleapis.com \\\n    secretmanager.googleapis.com\n\n# Store contents of private service account key in Secrets Manager to be used by airflow later within the `add_gcp_connections.py` DAG\n# Create a secrets manager secret from the key\ngcloud secrets create airflow-conn-secret \\\n    --replication-policy=\"automatic\" \\\n    --data-file=account.json\n\n# List the secret\ngcloud secrets list\n\n# verify secret contents ad hoc\ngcloud secrets versions access latest --secret=\"airflow-conn-secret\"\n\n# if you run the below toolkits multiple times, there may be times where you'll have to delete and recreate the secret\ngcloud secrets delete airflow-conn-secret\n\n# Optional: install specific Google Cloud version of kubectl\n# The homebrew installation earlier above will suffice\n# ┌──────────────────────────────────────────────────────────────────┐\n# │               These components will be installed.                │\n# ├─────────────────────┬─────────────────────┬──────────────────────┤\n# │         Name        │       Version       │         Size         │\n# ├─────────────────────┼─────────────────────┼──────────────────────┤\n# │ kubectl             │             1.15.11 │              \u003c 1 MiB │\n# │ kubectl             │             1.15.11 │             87.1 MiB │\n# └─────────────────────┴─────────────────────┴──────────────────────┘\ngcloud components install kubectl\n\n# Configure Docker\ngcloud auth configure-docker\n\n# Create SSH key pair for secure git clones\nssh-keygen\n```\n\n- Copy and paste contents of command below to your git repo SSH keys section\n\n```bash\ncat ~/.ssh/id_rsa.pub\n```\n\n- [Paste public ssh key contents location](https://github.com/settings/ssh/new)\n\n- Manually create a `cloud source mirror repo` based on the GitHub repo\n\n  \u003e Note: documented to not be possible through the current state API-[further reading](https://issuetracker.google.com/issues/73122477)\n\n  - [Full Instructions to Mirror a GitHub Repository](https://cloud.google.com/source-repositories/docs/mirroring-a-github-repository#create_a_mirrored_repository)\n\n- Replace all relevant variables within `dags/` folder\n\n```python\n# file location\n# /airflow-toolkit/dags/add_gcp_connections.py\nCONN_PARAMS_DICT = {\n    \"gcp_project\": \"wam-bam-258119\", # replace with your specific project\n    \"gcp_conn_id\": \"my_gcp_connection\",\n    \"gcr_conn_id\": \"gcr_docker_connection\",\n    \"secret_name\": \"airflow-conn-secret\",\n}\n\n# file location\n# /airflow-toolkit/dags/bigquery_connection_check.py\nTASK_PARAMS_DICT = {\n    \"dataset_id\": \"dbt_bq_example\",\n    \"project_id\": \"wam-bam-258119\", # replace with your specific project\n    \"gcp_conn_id\": \"my_gcp_connection\",\n}\n\n\n# file location\n# /airflow-toolkit/dags/airflow_utils.py\nGIT_REPO = \"github_sungchun12_airflow-toolkit\" # replace with the cloud source mirror repo name\nPROJECT_ID = \"wam-bam-258119\" # replace with your specific project\n```\n\n\u003e After doing the above ONCE, you can run the below toolkits multiple times with the same results(idempotent)\n\n---\n\n---\n\n## Toolkit #1: Local Desktop Kubernetes Airflow Deployment\n\n\u003e Time to Complete: 5-8 minutes\n\n\u003e Note: This was ONLY tested on a Mac desktop environment\n\n### System Design\n\n![local_desktop_airflow.png](/docs/local_desktop_airflow.png)\n\n### Specific Use Cases\n\n- Free local dev environment\n- Customize local cluster resources as needed\n- Rapid DAG development without waiting for an equivalent cloud environment to sync DAG changes\n- Experiment with a wider array of customization and permissions\n- Minimal knowledge of kubernetes and helm required\n\n### How to Deploy\n\n**General Mechanics**\n\n\u003e shell script logs will generate similar mechanics\n\n- Set environment variables\n- Build and push a custom dbt docker image\n- Copy Docker for Desktop Kube Config into git repo\n- Download stable helm repo\n- Setup Kubernetes airflow namespace\n- Create Kubernetes Secrets for the local cluster to download docker images from Google Container Registry based on Service Account\n- Create Kubernetes Secrets for dbt operations based on Service Account and ssh-keygen, to be used later in KubernetesPodOperator\n- Setup and Install Airflow Kubernetes Cluster with Helm\n- Wait for the Kubernetes Cluster to settle\n- Open airflow UI webserver from terminal\n\n---\n\n\u003e Note: I plan to automate this yaml setup in a future feature\n\n- Manually update the extraVolumes section within `custom-setup.yaml`, starting at `line 174`\n- Run `pwd` in your terminal from the root `airflow-toolkit/` directory and replace all the `\u003cYOUR GIT REPO DIRECTORY HERE\u003e` placeholders\n\n```yaml\nextraVolumes: # this will create the volume from the directory\n  - name: dags\n    hostPath:\n      path: \u003cYOUR GIT REPO DIRECTORY HERE\u003e/dags/\n  - name: dag-environment-configs\n    hostPath:\n      path: \u003cYOUR GIT REPO DIRECTORY HERE\u003e/dag_environment_configs/\n  - name: kube-config\n    hostPath:\n      path: \u003cYOUR GIT REPO DIRECTORY HERE\u003e/.kube/\n  - name: service-account\n    hostPath:\n      path: \u003cYOUR GIT REPO DIRECTORY HERE\u003e/account.json\n  - name: tests\n    hostPath:\n      path: \u003cYOUR GIT REPO DIRECTORY HERE\u003e/tests/\n\n# example below\nextraVolumes: # this will create the volume from the directory\n  - name: dags\n    hostPath:\n      path: /Users/sung/Desktop/airflow-toolkit/dags/\n  - name: dag-environment-configs\n    hostPath:\n      path: /Users/sung/Desktop/airflow-toolkit/dag_environment_configs/\n  - name: kube-config\n    hostPath:\n      path: /Users/sung/Desktop/airflow-toolkit/.kube/\n  - name: service-account\n    hostPath:\n      path: /Users/sung/Desktop/airflow-toolkit/account.json\n  - name: tests\n    hostPath:\n      path: /Users/sung/Desktop/airflow-toolkit/tests/\n```\n\n- Run the below commands in your terminal\n\n```bash\n#!/bin/bash\n# follow terminal prompt after entering below command\n# leave this terminal open to sustain airflow webserver\n# Set of environment variables\nexport ENV=\"dev\"\nexport PROJECT_ID=\"airflow-demo-build\"\nexport DOCKER_DBT_IMG=\"gcr.io/$PROJECT_ID/dbt_docker:$ENV-latest\"\n\nsource deploy_local_desktop_airflow.sh\n```\n\n- Turn on all the DAGs using the on/off button on the left side of the UI\n- After waiting a couple minutes, all the DAGs should succeed\n  \u003e Note: `bigquery_connection_check` will fail unless `add_gcp_connections` succeeds first\n\n![local_desktop_airflow.png](/docs/local_desktop_airflow_success.png)\n\n\u003e Note: the airflow webserver may freeze given resource limitations\n\n\u003e press `ctrl + c` within the terminal where you ran the deploy script\n\n\u003e Run the below commands in your terminal(these already exist within the deploy script)\n\n```bash\n# view airflow UI\nexport POD_NAME=$(kubectl get pods --namespace airflow -l \"component=web,app=airflow\" -o jsonpath=\"{.items[0].metadata.name}\")\n\necho \"airflow UI webserver --\u003e http://127.0.0.1:8080\"\n\nkubectl port-forward --namespace airflow $POD_NAME 8080:8080\n```\n\n- Open a SEPARATE terminal and run the below commands\n\n```bash\n# start a remote shell in the airflow worker for ad hoc operations or to run pytests\nkubectl exec -it airflow-worker-0 -- /bin/bash\n```\n\n- Airflow worker remote shell examples\n\n```bash\n➜  airflow-toolkit git:(feature-docs) ✗ kubectl exec -it airflow-worker-0 -- /bin/bash\n\n# list files in current working directory\nairflow@airflow-worker-0:/opt/airflow$ ls\nairflow.cfg  dag_environment_configs  dags  logs  tests  unittests.cfg\n\n# run all test scripts\nairflow@airflow-worker-0:/opt/airflow$ pytest -vv --disable-pytest-warnings\n======================================== test session starts ========================================\nplatform linux -- Python 3.6.10, pytest-5.4.3, py-1.9.0, pluggy-0.13.1 -- /usr/local/bin/python\ncachedir: .pytest_cache\nrootdir: /opt/airflow\nplugins: celery-4.4.2\ncollected 19 items\n\ntests/test_add_gcp_connections.py::test_import_dags PASSED                                    [  5%]\ntests/test_add_gcp_connections.py::test_contains_tasks PASSED                                 [ 10%]\ntests/test_add_gcp_connections.py::test_task_dependencies PASSED                              [ 15%]\ntests/test_add_gcp_connections.py::test_schedule PASSED                                       [ 21%]\ntests/test_add_gcp_connections.py::test_task_count_test_dag PASSED                            [ 26%]\ntests/test_add_gcp_connections.py::test_tasks[t1] PASSED                                      [ 31%]\ntests/test_add_gcp_connections.py::test_tasks[t2] PASSED                                      [ 36%]\ntests/test_add_gcp_connections.py::test_end_to_end_pipeline SKIPPED                           [ 42%]\ntests/test_dbt_example.py::test_import_dags PASSED                                            [ 47%]\ntests/test_dbt_example.py::test_contains_tasks PASSED                                         [ 52%]\ntests/test_dbt_example.py::test_task_dependencies PASSED                                      [ 57%]\ntests/test_dbt_example.py::test_schedule PASSED                                               [ 63%]\ntests/test_dbt_example.py::test_task_count_test_dag PASSED                                    [ 68%]\ntests/test_dbt_example.py::test_dbt_tasks[dbt_debug] PASSED                                   [ 73%]\ntests/test_dbt_example.py::test_dbt_tasks[dbt_run] PASSED                                     [ 78%]\ntests/test_dbt_example.py::test_dbt_tasks[dbt_test] PASSED                                    [ 84%]\ntests/test_dbt_example.py::test_end_to_end_pipeline SKIPPED                                   [ 89%]\ntests/test_sample.py::test_answer PASSED                                                      [ 94%]\ntests/test_sample.py::test_f PASSED                                                           [100%]\n\n============================= 17 passed, 2 skipped, 1 warning in 39.77s =============================\n\n# list DAGs\nairflow@airflow-worker-0:/opt/airflow$ airflow list_dags\n[2020-08-18 19:17:09,579] {__init__.py:51} INFO - Using executor CeleryExecutor\n[2020-08-18 19:17:09,580] {dagbag.py:396} INFO - Filling up the DagBag from /opt/airflow/dags\nSet custom environment variable GOOGLE_APPLICATION_CREDENTIALS for deployment setup: local_desktop\nSet custom environment variable GOOGLE_APPLICATION_CREDENTIALS for deployment setup: local_desktop\n\n\n-------------------------------------------------------------------\nDAGS\n-------------------------------------------------------------------\nadd_gcp_connections\nbigquery_connection_check\ndbt_example\nkubernetes_sample\ntutorial\n\n# import, get, set airflow variables\nairflow@airflow-worker-0:/opt/airflow$ airflow variables --import /opt/airflow/dag_environment_configs/test_airflow_variables.json\n1 of 1 variables successfully updated.\nairflow@airflow-worker-0:/opt/airflow$ airflow variables --get test_airflow_variable\n\ndo you see this?\nairflow@airflow-worker-0:/opt/airflow$ airflow variables --set test_airflow_variable \"lovely\"\nairflow@airflow-worker-0:/opt/airflow$ airflow variables --get test_airflow_variable\nlovely\n```\n\n### How to Destroy\n\n\u003e Time to Complete: 1-2 minutes\n\n```bash\n#!/bin/bash\nsource teardown_local_desktop_airflow.sh\n```\n\n- Example terminal output\n\n```bash\n➜  airflow-toolkit git:(feature-docs) ✗ source teardown_local_desktop_airflow.sh\n***********************\nDelete Kuberenetes Cluster Helm Deployment and Secrets\n***********************\nrelease \"airflow\" uninstalled\nkill: illegal process id: f19\nsecret \"dbt-secret\" deleted\nsecret \"gcr-key\" deleted\nsecret \"ssh-key-secret\" deleted\nnamespace \"airflow\" deleted\n```\n\n### How to give local desktop airflow more horsepower :horse:\n\n- [Scaling airflow resources](https://www.astronomer.io/guides/airflow-scaling-workers/)\n\n### Tradeoffs\n\n#### Pros\n\n- Simple setup and no need to worry about manually destroying airflow cloud infrastructure\n- Free(minimal to no charges to your Google Cloud account)\n- Same example DAGs work as is in cloud infrastructure setup\n- Minimal kubernetes/helm skills required to start\n- Ability to scale infrastructure locally(ex: add more worker pods)\n- Run `pytest` directly within this setup\n\n#### Cons\n\n- Limited to how much processing power and storage you have on your local desktop machine\n- Airflow webserver can freeze occasionally and requires quick restart\n\n### General Kubernetes Concepts\n\n- The local desktop airflow kubernetes cluster will use the service account within the `airflow` namespace to pull the image from Google Container Registry based on the manually created secret: `gcr-key`\n\n```bash\nkubectl get serviceaccounts\n\nNAME      SECRETS   AGE\nairflow   1         43m\ndefault   1         43m\n```\n\n- The `KubernetesPodOperator` will pull the docker image based on the permissions above BUT will run `dbt` operations based on the manually created secret: `dbt-secret`\n\n```bash\nkubectl get secrets\n\nNAME                            TYPE                                  DATA   AGE\nairflow-postgresql              Opaque                                1      50m\nairflow-redis                   Opaque                                1      50m\nairflow-token-zfpz8             kubernetes.io/service-account-token   3      50m\ndbt-secret                      Opaque                                1      50m\ndefault-token-pz55g             kubernetes.io/service-account-token   3      50m\ngcr-key                         kubernetes.io/dockerconfigjson        1      50m\nsh.helm.release.v1.airflow.v1   helm.sh/release.v1                    1      50m\n```\n\n### View Local Kubernetes Dashboard\n\n\u003e Optional: Detailed resource management view for local desktop\n\n```bash\n# install kubernetes dashboard\nkubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.1.0/aio/deploy/recommended.yaml\n\n# start the web server\nkubectl proxy\n\n# view the dashboard\nopen http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy#/login\n\n# copy and paste the token output into dashboard UI\nkubectl -n kube-system describe secret $(kubectl -n kube-system get secret | awk '/^deployment-controller-token-/{print $1}') | awk '$1==\"token:\"{print $2}'\n```\n\n![kube_resource_dashboard](/docs/kube_resource_dashboard.png)\n\n\u003e Enter `ctrl + c` within the terminal where you ran the kubernetes dashboard script to close it\n\n---\n\n---\n\n## Toolkit #2: Terragrunt-Driven Terraform Deployment to Google Cloud\n\n**Follow `Post-Deployment Instructions for Toolkits #2 \u0026 #3` instructions AFTER deployment**\n\n\u003e Time to Complete: 50-60 minutes(majority of time waiting for cloud composer to finish deploying)\n\n\u003e Note: This follows the example directory structure provided by terragrunt with modules housed in the same git repo-[further reading](https://github.com/gruntwork-io/terragrunt-infrastructure-live-example)\n\n\u003e Do NOT run this in parallel with toolkit #3 as default variables will cause conflicts\n\n### System Design\n\n![terragrunt_deployment](/docs/terragrunt_deployment.png)\n\n\u003e terraform state will be split into multiple files per module\n\n#### Terragrunt Resources\n\n- [Keep your Terraform code DRY](https://terragrunt.gruntwork.io/docs/features/keep-your-terraform-code-dry/)\n- [Relative Paths](https://community.gruntwork.io/t/relative-paths-in-terragrunt-modules/144/6)\n- [Handling Dependencies](https://community.gruntwork.io/t/handling-dependencies/315/2)\n- [Terraform force unlock](https://www.terraform.io/docs/commands/force-unlock.html)\n- [Third Party Reasons to use terragrunt](https://transcend.io/blog/why-we-use-terragrunt)\n- [Managing Terraform Secrets](https://blog.gruntwork.io/a-comprehensive-guide-to-managing-secrets-in-your-terraform-code-1d586955ace1)\n- [Google Provider Documentation](https://www.terraform.io/docs/providers/google/guides/provider_reference.html#full-reference)\n\n### Specific Use Cases\n\n- Low cost Google Cloud dev airflow dev environment($10-$20/day)\n- Test local desktop DAGs against cloud infrastructure that will have more parity with qa and prod environments\n- Add more horsepower to your data pipelines\n- Infrastructure as code that is DevOps friendly with terraform modules that do NOT change or duplicate, only terragrunt configs change\n\n### How to Deploy\n\n\u003e Read Post-Deployment Instructions for Toolkits #2 \u0026 #3 after this deployment\n\n- Create a service account secret to authorize the terragrunt/terraform deployment\n\n```bash\n#!/bin/bash\n# create a secrets manager secret from the key\ngcloud secrets create terraform-secret \\\n    --replication-policy=\"automatic\" \\\n    --data-file=account.json\n\n# List the secret\n# example terminal output\n# NAME                 CREATED              REPLICATION_POLICY  LOCATIONS\n# airflow-conn-secret  2020-08-18T19:45:50  automatic           -\n# terraform-secret     2020-08-12T14:34:50  automatic           -\ngcloud secrets list\n\n\n# verify secret contents ad hoc\ngcloud secrets versions access latest --secret=\"terraform-secret\"\n```\n\n- replace important terragrunt configs for your specific setup\n\n```hcl\n# file location\n# /airflow-toolkit/terragrunt_infrastructure_live/non-prod/account.hcl\n\nlocals {\n  project               = \"wam-bam-258119\" # replace this with your specific project\n  service_account_email = \"demo-v2@wam-bam-258119.iam.gserviceaccount.com\" # replace this with your specific service account email\n}\n\n# file location\n# /airflow-toolkit/terragrunt_infrastructure_live/terragrunt.hcl\n\nremote_state {\n  backend = \"gcs\"\n  generate = {\n    path      = \"backend.tf\"\n    if_exists = \"overwrite\"\n  }\n  config = {\n    project     = \"${local.project}\"\n    location    = \"${local.region}\"\n    credentials = \"${local.credentials_file}\"\n    bucket      = \"secure-bucket-tfstate-airflow-infra-${local.region}\" # replace with something unique BEFORE `-${local.region}`\n    prefix      = \"${path_relative_to_include()}\"\n  }\n}\n```\n\n```bash\n#!/bin/bash\n# assumes you are already in the the repo root directory\ncd terragrunt_infrastructure_live/non-prod/us-central1/dev/\n\n# export the Google Cloud project ID where the secrets live-to be used by terragrunt\n# example: export PROJECT_ID=\"wam-bam-258119\"\nexport PROJECT_ID=\u003cyour project id\u003e\n\ngcloud config set project $PROJECT_ID #TODO: add this step to the CICD pipeline rather than the get secret shell script\n\n# this has mock outputs to emulate module dependencies with a prefix \"mock-\"\n# OR you can run a more specific plan\n# terragrunt run-all plan -out=terragrunt_plan\n# --terragrunt-non-interactive flag if this is run for the first time to create the state gcs bucket without a prompt\n# https://github.com/gruntwork-io/terragrunt/issues/486\nterragrunt run-all plan --terragrunt-non-interactive\n\n# this has mock outputs to emulate module dependencies\nterragrunt run-all validate\n\n# follow terminal prompt after entering below command\n# do NOT interrupt this process until finished or it will corrupt terraform state\n# OR you can apply a more specific plan\n# terragrunt run-all apply terragrunt_plan\nterragrunt run-all apply\n\n```\n\n### How to Destroy\n\n\u003e Time to Complete: 5-10 minutes\n\n```bash\n#!/bin/bash\n# follow terminal prompt after entering below command\nterragrunt destroy-all\n\n# you may occasionally see terragrunt errors related to duplicate files\n# run the below often to avoid those errors\ncd terragrunt_infrastructure_live/\n\nbash terragrunt_cleanup.sh\n```\n\n### Tradeoffs\n\n#### Pros\n\n- Explicit terraform module dependencies(through terragrunt functionality)\n- Keeps your terraform code DRY\n- terraform state is divided by module\n- Separate config and terraform module management\n- Ability to spin up multiple environments(dev, qa) with a single `terragrunt plan-all` command within the dir: `./terragrunt_infrastructure_live/non-prod/`\n- Same example DAGs work as is in local desktop setup\n- Minimal kubernetes skills required to start\n- Access to more virtual horsepower(ex: add more worker pods through cloud composer configs)\n- Dynamically authorizes infrastructure operations through secrets manager\n- A DevOps engineer should only need to copy and paste the dev terragrunt configs and update inputs for other environments(qa, prod)\n- VPC-native, private-IP, bastion host, ssh via identity aware proxy, and other security-based, reasonable defaults(minimal touchpoints with the public internet)\n\n#### Cons\n\n- You must destroy your respective dev environments every day or risk accruing costs for idle resources\n- Time to complete is long\n- Destroying all the infrastructure through terragrunt/terraform does NOT automatically destroy the cloud composer gcs bucket used to sync DAGs-[further reading](https://www.terraform.io/docs/providers/google/r/composer_environment.html)\n- If you customize the VPC to prevent any kind of public internet access, cloud composer will not deploy properly(as it needs to reach out to pypi for python dependencies)\n- Need to learn another tool: terragrunt(definitely worth it)\n\n---\n\n---\n\n## Toolkit #3: Simple Terraform Deployment to Google Cloud\n\n**Follow `Post-Deployment Instructions for Toolkits #2 \u0026 #3` instructions AFTER deployment**\n\n\u003e Time to Complete: 50-60 minutes(majority of time waiting for cloud composer to finish deploying)\n\n\u003e Note: This uses terragrunt as a thin wrapper within a single subdirectory\n\n\u003e Do NOT run this in parallel with toolkit #2 as default variables will cause conflicts\n\n### System Design\n\n![terragrunt_deployment](/docs/terragrunt_deployment.png)\n\n\u003e terraform state will be stored in one file\n\n### Specific Use Cases\n\n- Best used for quick and easy setup for a data engineer, NOT intended for hand off to a DevOps engineer\n- Low cost Google Cloud dev airflow dev environment($10-$20/day)\n- Test local desktop DAGs against cloud infrastructure that will have more parity with qa and prod environments\n- Add more horsepower to your data pipelines\n\n### How to Deploy\n\n\u003e Read Post-Deployment Instructions for Toolkits #2 \u0026 #3 after this deployment\n\n- Create a service account secret to authorize the terraform deployment\n\n```bash\n#!/bin/bash\n# create a secrets manager secret from the key\ngcloud secrets create terraform-secret \\\n    --replication-policy=\"automatic\" \\\n    --data-file=account.json\n\n# List the secret\n# example terminal output\n# NAME                 CREATED              REPLICATION_POLICY  LOCATIONS\n# airflow-conn-secret  2020-08-18T19:45:50  automatic           -\n# terraform-secret     2020-08-12T14:34:50  automatic           -\ngcloud secrets list\n\n\n# verify secret contents ad hoc\ngcloud secrets versions access latest --secret=\"terraform-secret\"\n```\n\n- replace important terragrunt/terraform configs for your specific setup\n\n```hcl\n# file location\n# /airflow-toolkit/terraform_simple_setup/terragrunt.hcl\n\nremote_state {\n  backend = \"gcs\"\n  generate = {\n    path      = \"backend.tf\"\n    if_exists = \"overwrite\"\n  }\n  config = {\n    project     = \"wam-bam-258119\" # replace with your GCP project id\n    location    = \"US\"\n    credentials = \"service_account.json\"\n    bucket      = \"secure-bucket-tfstate-composer\" # replace with something unique\n    prefix      = \"dev\"\n  }\n}\n```\n\n```hcl\n# file location\n# /airflow-toolkit/terraform_simple_setup/variables.tf\n\nvariable \"project\" {\n  description = \"name of your GCP project\"\n  type        = string\n  default     = \"big-dreams-please\" # replace with your GCP project id\n}\n\nvariable \"service_account_email\" {\n  description = \"Service account used for VMs\"\n  type        = string\n  default     = \"demo-service-account@big-dreams-please.iam.gserviceaccount.com\" # replace with your service account email\n}\n```\n\n- Copy and paste the `account.json` into the directory below and rename it `service_account.json`\n  \u003e Avoids the hassle of calling the terraform-secret for this simple terraform setup\n\n```bash\n#!/bin/bash\n\ncd terraform_simple_setup/\n\n# utilizes terragrunt as a thin wrapper utility to automatically create the gcs backend remote state bucket\nterragrunt init\n\n# preview the cloud resources you will create\n# OR you can run a more specific plan\n# terraform plan -out=terraform_plan\nterraform plan\n\n# validate terraform syntax and configuration\nterraform validate\n\n# follow terminal prompt after entering below command\n# OR you can apply a more specific plan\n# terraform apply terraform_plan\nterraform apply\n```\n\n### How to Destroy\n\n\u003e Time to Complete: 5-10 minutes\n\n```bash\n#!/bin/bash\n# follow terminal prompt after entering below command\nterraform destroy\n```\n\n### Tradeoffs\n\n#### Pros\n\n- Explicit terraform module dependencies(through built-in terraform functionality)\n- Separate config and terraform module management\n- Same example DAGs work as is in local desktop setup\n- Minimal kubernetes skills required to start\n- Access to more virtual horsepower(ex: add more worker pods through cloud composer configs)\n- Dynamically authorizes infrastructure operations through secrets manager\n- A DevOps engineer should only need to copy and paste the dev terragrunt configs and update inputs for other environments(qa, prod)\n- VPC-native, private-IP, bastion host, ssh via identity aware proxy, and other security-based, reasonable defaults(minimal touchpoints with the public internet)\n\n#### Cons\n\n- You must destroy your respective dev environments every day or risk accruing costs for idle resources\n- Time to complete is long\n- Destroying all the infrastructure through terragrunt/terraform does NOT automatically destroy the cloud composer gcs bucket used to sync DAGs-[further reading](https://www.terraform.io/docs/providers/google/r/composer_environment.html)\n- If you customize the VPC to prevent any kind of public internet access, cloud composer will not deploy properly(as it needs to reach out to pypi for python dependencies)\n- To create a similar environment(qa, prod), you would have to copy and paste the terraform modules into a separate directory and run the above steps. Terraform code is NOT DRY\n- The DevOps engineer has to take on a lot more work to maintain this deployment across several environments\n- terraform state is all contained within one file\n\n---\n\n---\n\n## Post-Deployment Instructions for Toolkits #2 \u0026 #3\n\n\u003e Time to Complete: 5-10 minutes\n\u003e Only compute instances on the same VPC as Cloud Composer can access the environment programmatically\n\u003e `gcloud composer` commands will NOT work on your local desktop\n\n- After the terragrunt/terraform deployment is successful, run the below commands in your local desktop terminal\n- Add in `Compute Instance Admin(v1) and Service Account User` roles to the iap ssh service account(adjust the terraform code less) OR create a custom role with `compute.instances.setMetadata`(adjust the terraform code more)\n\n\u003e If you are the owner of the project, you can skip the identity aware proxy ssh step and simply ssh through the console itself\n\n```bash\n#!/bin/bash\n\n# ssh via identity aware proxy into the bastion host(which will then run commands against cloud composer)\n# update the env vars before running ssh tunnel\nACCESS_KEY_FILE=\"account.json\"\nPROJECT_ID=\"airflow-demo-build\" # your GCP project ID\nZONE=\"us-central1-b\" # your GCP compute engine ZONE defined in terraform/terragrunt variables, likely us-central1-a or us-central1-b\n# SERVICE_ACCOUNT_EMAIL=\"service-account-iap-ssh@$PROJECT_ID.iam.gserviceaccount.com\" # Toolkit 3 Default\nSERVICE_ACCOUNT_EMAIL=\"iap-ssh-sa-dev@$PROJECT_ID.iam.gserviceaccount.com\" # Toolkit 2 Default\nKEY_FILE=\"iap-ssh-access-sa.json\"\nsource utils/cloud_composer/iap_ssh_tunnel.sh\n\n# install basic software in the bastion host\nsudo apt-get install kubectl git\n\n# Set Composer project, location, and zone\n# The hard-code values are based on defaults set by terraform module variables\n# Minimizes redundant flags in downstream commands\ngcloud config set project airflow-demo-build # your GCP project ID\ngcloud config set composer/location us-central1\ngcloud config set compute/zone us-central1-b # your GCP compute engine ZONE defined in terraform/terragrunt variables, likely us-central1-a or us-central1-b\n\n# list cloud composer DAGs\ngcloud composer environments run dev-composer \\\n    list_dags\n\n# capture cloud composer environment config\nCOMPOSER_ENVIRONMENT=\"dev-composer\"\nCOMPOSER_CONFIG=$(gcloud composer environments describe ${COMPOSER_ENVIRONMENT} --format='value(config.gkeCluster)')\n# COMPOSER_CONFIG ex: projects/wam-bam-258119/zones/us-central1-b/clusters/us-central1-dev-composer-de094856-gke\n\n# capture kubernetes credentials and have kubectl commands point to this cluster\ngcloud container clusters get-credentials $COMPOSER_CONFIG\n\n# copy and paste contents of service account json file from local machine into the bastion host\ncat \u003c\u003cEOF \u003e account.json\n\u003cpaste service account file contents\u003e\nEOF\n\n# be very careful with naming convention for this secret or else the KubernetesPodOperator will timeout\nkubectl create secret generic dbt-secret --from-file=account.json\n\n# Create SSH key pair for secure git clones\nssh-keygen\n\n# copy and paste contents to your git repo SSH keys section\n# https://github.com/settings/keys\ncat ~/.ssh/id_rsa.pub\n\n# create the ssh key secret\nkubectl create secret generic ssh-key-secret \\\n  --from-file=id_rsa=$HOME/.ssh/id_rsa \\\n  --from-file=id_rsa.pub=$HOME/.ssh/id_rsa.pub\n\nkubectl get secrets\n```\n\n- Open a separate terminal to run the below\n\n```bash\n#!/bin/bash\n# these commands work from the `airflow-toolkit/` root directory\n\n# reauthorize the main service account to gcloud\ngcloud auth activate-service-account --key-file account.json\n\n# add secrets manager IAM policy binding to composer service account\n# The hard-code values are based on defaults set by terraform module variables\nPROJECT_ID=\"airflow-demo-build\"\nMEMBER_SERVICE_ACCOUNT_EMAIL=\"serviceAccount:composer-sa-dev@$PROJECT_ID.iam.gserviceaccount.com\" # Toolkit 2 Default\n# MEMBER_SERVICE_ACCOUNT_EMAIL=\"serviceAccount:composer-dev-account@$PROJECT_ID.iam.gserviceaccount.com\" # Toolkit 3 Default\nSECRET_ID=\"airflow-conn-secret\"\n\ngcloud secrets add-iam-policy-binding $SECRET_ID \\\n    --member=$MEMBER_SERVICE_ACCOUNT_EMAIL \\\n    --role=\"roles/secretmanager.secretAccessor\"\n\n# Configure variables to interact with cloud composer\nexport PROJECT_DIR=$PWD\n\n# Set Composer location\ngcloud config set composer/location us-central1\n\nCOMPOSER_ENVIRONMENT=\"dev-composer\"\nCOMPOSER_BUCKET=$(gcloud composer environments describe ${COMPOSER_ENVIRONMENT} --format='value(config.dagGcsPrefix)' | sed 's/\\/dags//g')\n\n# sync files in dags folder to the gcs bucket linked to cloud composer\n# this may not work if you have python 3.8.5 installed on macOS\n# see: https://github.com/GoogleCloudPlatform/gsutil/issues/961\ngsutil -m rsync -r $PROJECT_DIR/dags $COMPOSER_BUCKET/dags\n```\n\n\u003e Note: The airflow webserver will take 30 seconds to update the view with the updated DAGs. However, you can run DAGs as soon as you upload the new files to the gcs bucket.\n\n- [Instructions to access the airflow webserver UI](https://cloud.google.com/composer/docs/how-to/accessing/airflow-web-interface?hl=fi#accessing_the_web_interface_via_the)\n- Rerun all DAGs within cloud composer for success validation\n  \u003e Note: `bigquery_connection_check` will fail unless `add_gcp_connections` succeeds first\n\n---\n\n---\n\n## Git Repo Folder Structure\n\n| Folder                            | Purpose                                                                    |\n| --------------------------------- | -------------------------------------------------------------------------- |\n| .github/workflows                 | Quick terragrunt/terraform validations                                     |\n| dags                              | airflow pipeline code                                                      |\n| dags_archive                      | draft DAG code                                                             |\n| dbt_bigquery_example              | Working and locally tested dbt code which performs BigQuery SQL transforms |\n| Dockerfiles                       | Docker images to be used by Cloud Composer                                 |\n| docs                              | Images and other relevant documentation                                    |\n| terraform_simple_setup            | Terraform modules for a terraform-only setup                               |\n| terragrunt_infrastructure_live    | Terragrunt orchestrator to run terraform operations                        |\n| terragrunt_infrastructure_modules | Base terraform modules for terragrunt to consume in the `live` directory   |\n| tests                             | Example DAG test cases                                                     |\n| utils                             | Various utilities to automate more specific ad hoc tasks                   |\n\n## Frequently Asked Questions(FAQ)\n\n- Why do you use identity aware proxy for remote ssh access into the bastion host?\n  - Creates a readable audit trail for ssh access patterns\n- Do you have an equivalent deployment repo for AWS/Azure?\n  - No, more then open to a pull request that includes this\n- Why not use terratest?\n  - I don't know the go programming language well enough, but plan to learn in the future ;)\n- Why pytest?\n  - Less verbose testing framework compared to python's built-in unittest\n  - Already had battle-tested boilerplate testing code\n\n## Resources\n\n- [Helm Quickstart](https://helm.sh/docs/intro/quickstart/)\n- [Helm Chart Official Release](https://artifacthub.io/packages/helm/airflow-helm/airflow)\n- [Helm Chart Source Code](https://github.com/airflow-helm/charts/tree/main/charts/airflow)\n- [SQLite issue](https://github.com/helm/charts/issues/22477)\n- [kubectl commands](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands)\n- [What is a pod?](https://kubernetes.io/docs/concepts/workloads/pods/pod/)\n- [Kubernetes Dashboard for Docker Desktop](https://medium.com/backbase/kubernetes-in-local-the-easy-way-f8ef2b98be68)\n- [Cost effective way to scale the airflow scheduler](https://medium.com/@royzipuff/the-smarter-way-of-scaling-with-composers-airflow-scheduler-on-gke-88619238c77b)\n- [Kubernetes on Docker Desktop Limitations](https://docs.docker.com/docker-for-mac/kubernetes/)\n- [Installing Homebrew in GitHub Actions](https://github.community/t/installing-homebrew-on-linux/17994)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsungchun12%2Fairflow-toolkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsungchun12%2Fairflow-toolkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsungchun12%2Fairflow-toolkit/lists"}