{"id":13416235,"url":"https://github.com/villasv/aws-airflow-stack","last_synced_at":"2025-03-14T23:31:27.546Z","repository":{"id":173135722,"uuid":"93064492","full_name":"villasv/aws-airflow-stack","owner":"villasv","description":"Turbine: the bare metals that gets you Airflow","archived":true,"fork":false,"pushed_at":"2021-10-10T21:09:41.000Z","size":2730,"stargazers_count":378,"open_issues_count":26,"forks_count":69,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-03-14T22:57:44.096Z","etag":null,"topics":["airflow","airflow-cluster","airflow-cookbook","aws","aws-cloudformation"],"latest_commit_sha":null,"homepage":"https://victor.villas/aws-airflow-stack/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/villasv.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null},"funding":{"custom":"https://www.buymeacoffee.com/villasv"}},"created_at":"2017-06-01T14:12:44.000Z","updated_at":"2025-02-20T11:05:46.000Z","dependencies_parsed_at":"2024-01-14T12:29:20.618Z","dependency_job_id":null,"html_url":"https://github.com/villasv/aws-airflow-stack","commit_stats":null,"previous_names":["villasv/aws-airflow-stack"],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/villasv%2Faws-airflow-stack","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/villasv%2Faws-airflow-stack/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/villasv%2Faws-airflow-stack/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/villasv%2Faws-airflow-stack/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/villasv","download_url":"https://codeload.github.com/villasv/aws-airflow-stack/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243663502,"owners_count":20327299,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","airflow-cluster","airflow-cookbook","aws","aws-cloudformation"],"created_at":"2024-07-30T21:00:55.845Z","updated_at":"2025-03-14T23:31:23.494Z","avatar_url":"https://github.com/villasv.png","language":"Python","funding_links":["https://www.buymeacoffee.com/villasv"],"categories":["Python","Soluções de deployment do Airflow","Airflow deployment solutions"],"sub_categories":[],"readme":"\u003e ⚠️ This project is no longer receiving updates. We have moved on from using CloudFormation to manage our infrastructure and recommend others doing the same. As of 2021/2022, we think CDK and Terraform are now the best in class for IaC. Also, Airflow for Kubernetes has gotten more traction and we have moved into EKS for our self-managed Airflow. If you're looking for Open Source Airflow deployment options, we recommend [Astronomer](https://www.astronomer.io/).\n\n\u003cimg src=\".github/img/logo.png\" align=\"right\" width=\"25%\" /\u003e\n\n# Turbine [![GitHub Release](https://img.shields.io/github/release/villasv/aws-airflow-stack.svg?style=flat-square\u0026logo=github)](https://github.com/villasv/aws-airflow-stack/releases/latest) [![Build Status](https://img.shields.io/github/workflow/status/villasv/aws-airflow-stack/Stack%20Release%20Pipeline?style=flat-square\u0026logo=github\u0026logoColor=white\u0026label=build)](https://github.com/villasv/aws-airflow-stack/actions?query=workflow%3A%22Stack+Release+Pipeline%22+branch%3Amaster) [![CFN Deploy](https://img.shields.io/badge/CFN-deploy-green.svg?style=flat-square\u0026logo=amazon-aws)](#get-it-working)\n\nTurbine is the set of bare metals behind a simple yet complete and efficient\nAirflow setup.\n\nThe project is intended to be easily deployed, making it great for testing,\ndemos and showcasing Airflow solutions. It is also expected to be easily\ntinkered with, allowing it to be used in real production environments with\nlittle extra effort. Deploy in a few clicks, personalize in a few fields,\nconfigure in a few commands.\n\n## Overview\n\n![stack diagram](/.github/img/stack-diagram.png)\n\nThe stack is composed mainly of three services: the Airflow web server, the\nAirflow scheduler, and the Airflow worker. Supporting resources include an RDS\nto host the Airflow metadata database, an SQS to be used as broker backend, S3\nbuckets for logs and deployment bundles, an EFS to serve as shared directory,\nand a custom CloudWatch metric measured by a timed AWS Lambda. All other\nresources are the usual boilerplate to keep the wind blowing.\n\n### Deployment and File Sharing\n\nThe deployment process through CodeDeploy is very flexible and can be tailored\nfor each project structure, the only invariant being the Airflow home directory\nat `/airflow`. It ensures that every Airflow process has the same files and can\nupgraded gracefully, but most importantly makes deployments really fast and easy\nto begin with.\n\nThere's also an EFS shared directory mounted at at `/mnt/efs`, which can be\nuseful for staging files potentially used by workers on different machines and\nother synchronization scenarios commonly found in ETL/Big Data applications. It\nfacilitates migrating legacy workloads not ready for running on distributed\nworkers.\n\n### Workers and Auto Scaling\n\nThe stack includes an estimate of the cluster load average made by analyzing the\namount of failed attempts to retrieve a task from the queue. The metric\nobjective is to measure if the cluster is correctly sized for the influx of\ntasks. Worker instances have lifecycle hooks promoting a graceful shutdown,\nwaiting for tasks completion when terminating.\n\nThe goal of the auto scaling feature is to respond to changes in queue load,\nwhich could mean an idle cluster becoming active or a busy cluster becoming\nidle, the start/end of a backfill, many DAGs with similar schedules hitting\ntheir due time, DAGs that branch to many parallel operators. **Scaling in\nresponse to machine resources like facing CPU intensive tasks is not the goal**;\nthe latter is a very advanced scenario and would be best handled by Celery's own\nscaling mechanism or offloading the computation to another system (like Spark or\nKubernetes) and use Airflow only for orchestration.\n\n## Get It Working\n\n### 0. Prerequisites\n\n- Configured AWS CLI for deploying your own files\n  [(Guide)](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)\n\n### 1. Deploy the stack\n\nCreate a new stack using the latest template definition at\n[`templates/turbine-master.template`](/templates/turbine-master.template). The\nfollowing button will deploy the stack available in this project's `master`\nbranch (defaults to your last used region):\n\n[![Launch](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/images/cloudformation-launch-stack-button.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/new?templateURL=https://turbine-quickstart.s3.amazonaws.com/quickstart-turbine-airflow/templates/turbine-master.template)\n\nThe stack resources take around 15 minutes to create, while the airflow\ninstallation and bootstrap another 3 to 5 minutes. After that you can already\naccess the Airflow UI and deploy your own Airflow DAGs.\n\n### 2. Upstream your files\n\nThe only requirement is that you configure the deployment to copy your Airflow\nhome directory to `/airflow`. After crafting your `appspec.yml`, you can use the\nAWS CLI to deploy your project.\n\nFor convenience, you can use this [`Makefile`](/examples/project/Makefile) to\nhandle the packaging, upload and deployment commands. A minimal working example\nof an Airflow project to deploy can be found at\n[`examples/project/airflow`](/examples/project/airflow).\n\nIf you follow this blueprint, a deployment is as simple as:\n\n```bash\nmake deploy stack-name=yourcoolstackname\n```\n\n## Maintenance and Operation\n\nSometimes the cluster operators will want to perform some additional setup,\ndebug or just inspect the Airflow services and database. The stack is designed\nto minimize this need, but just in case it also offers decent internal tooling\nfor those scenarios.\n\n### Using Systems Manager Sessions\n\nInstead of the usual SSH procedure, this stack encourages the use of AWS Systems\nManager Sessions for increased security and auditing capabilities. You can still\nuse the CLI after a bit more configuration and not having to expose your\ninstances or creating bastion instances is worth the effort. You can read more\nabout it in the Session Manager\n[docs](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager.html).\n\n### Running Airflow commands\n\nThe environment variables used by the Airflow service are not immediately\navailable in the shell. Before running Airflow commands, you need to load the\nAirflow configuration:\n\n```bash\n$ export $(xargs \u003c/etc/sysconfig/airflow.env)\n$ airflow list_dags\n```\n\n### Inspecting service logs\n\nThe Airflow service runs under `systemd`, so logs are available through\n`journalctl`. Most often used arguments include the `--follow` to keep the logs\ncoming, or the `--no-pager` to directly dump the text lines, but it offers [much\nmore](https://www.freedesktop.org/software/systemd/man/journalctl.html).\n\n```bash\n$ sudo journalctl -u airflow -n 50\n```\n\n\n## FAQ\n\n1. Why does auto scaling takes so long to kick in?\n\n    AWS doesn't provide minute-level granularity on SQS metrics, only 5 minute\n    aggregates. Also, CloudWatch stamps aggregate metrics with their initial\n    timestamp, meaning that the latest stable SQS metrics are from 10 minutes in\n    the past. This is why the load metric is always 5~10 minutes delayed. To\n    avoid oscillating allocations, the alarm action has a 10 minutes cooldown.\n\n2. Why can't I stop running tasks by terminating all workers?\n\n    Workers have lifecycle hooks that make sure to wait for Celery to finish its\n    tasks before allowing EC2 to terminate that instance (except maybe for Spot\n    Instances going out of capacity). If you want to kill running tasks, you\n    will need to SSH into worker instances and stop the airflow service\n    forcefully.\n\n3. Is there any documentation around the architectural decisions?\n\n    Yes, most of them should be available in the project's GitHub\n    [Wiki](https://github.com/villasv/aws-airflow-stack/wiki). It doesn't mean\n    those decisions are final, but reading them beforehand will help formulating\n    new proposals.\n\n## Contributing\n\n\u003eThis project aims to be constantly evolving with up to date tooling and newer\n\u003eAWS features, as well as improving its design qualities and maintainability.\n\u003eRequests for Enhancement should be abundant and anyone is welcome to pick them\n\u003eup.\n\u003e\n\u003eStacks can get quite opinionated. If you have a divergent fork, you may open a\n\u003eRequest for Comments and we will index it. Hopefully this will help to build a\n\u003ediverse set of possible deployment models for various production needs.\n\nSee the [contribution guidelines](/CONTRIBUTING.md) for details.\n\nYou may also want to take a look at the [Citizen Code of\nConduct](/CODE_OF_CONDUCT.md).\n\nDid this project help you? Consider buying me a cup of coffee ;-)\n\n[![Buy me a coffee!](https://www.buymeacoffee.com/assets/img/custom_images/white_img.png)](https://www.buymeacoffee.com/villasv)\n\n## Licensing\n\n\u003e MIT License\n\u003e\n\u003e Copyright (c) 2017 Victor Villas\n\nSee the [license file](/LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvillasv%2Faws-airflow-stack","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvillasv%2Faws-airflow-stack","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvillasv%2Faws-airflow-stack/lists"}