Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/villasv/aws-airflow-stack
Turbine: the bare metals that gets you Airflow
https://github.com/villasv/aws-airflow-stack
airflow airflow-cluster airflow-cookbook aws aws-cloudformation
Last synced: 3 months ago
JSON representation
Turbine: the bare metals that gets you Airflow
- Host: GitHub
- URL: https://github.com/villasv/aws-airflow-stack
- Owner: villasv
- License: mit
- Archived: true
- Created: 2017-06-01T14:12:44.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2021-10-10T21:09:41.000Z (over 3 years ago)
- Last Synced: 2024-07-31T21:55:41.647Z (5 months ago)
- Topics: airflow, airflow-cluster, airflow-cookbook, aws, aws-cloudformation
- Language: Python
- Homepage: https://victor.villas/aws-airflow-stack/
- Size: 2.6 MB
- Stars: 374
- Watchers: 11
- Forks: 69
- Open Issues: 26
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-apache-airflow-br - aws-airflow-stack - Um cluster de deploy baseado na AWS com o CeleryExecutor. Ele faz o deploy apos alguns cliques com CloudFormation (Soluções de deployment do Airflow)
- awesome-apache-airflow - aws-airflow-stack - An AWS based Airflow cluster deployment with CeleryExecutor. Deploys after a few clicks with CloudFormation. (Airflow deployment solutions)
README
> ⚠️ This project is no longer receiving updates. We have moved on from using CloudFormation to manage our infrastructure and recommend others doing the same. As of 2021/2022, we think CDK and Terraform are now the best in class for IaC. Also, Airflow for Kubernetes has gotten more traction and we have moved into EKS for our self-managed Airflow. If you're looking for Open Source Airflow deployment options, we recommend [Astronomer](https://www.astronomer.io/).
# Turbine [![GitHub Release](https://img.shields.io/github/release/villasv/aws-airflow-stack.svg?style=flat-square&logo=github)](https://github.com/villasv/aws-airflow-stack/releases/latest) [![Build Status](https://img.shields.io/github/workflow/status/villasv/aws-airflow-stack/Stack%20Release%20Pipeline?style=flat-square&logo=github&logoColor=white&label=build)](https://github.com/villasv/aws-airflow-stack/actions?query=workflow%3A%22Stack+Release+Pipeline%22+branch%3Amaster) [![CFN Deploy](https://img.shields.io/badge/CFN-deploy-green.svg?style=flat-square&logo=amazon-aws)](#get-it-working)
Turbine is the set of bare metals behind a simple yet complete and efficient
Airflow setup.The project is intended to be easily deployed, making it great for testing,
demos and showcasing Airflow solutions. It is also expected to be easily
tinkered with, allowing it to be used in real production environments with
little extra effort. Deploy in a few clicks, personalize in a few fields,
configure in a few commands.## Overview
![stack diagram](/.github/img/stack-diagram.png)
The stack is composed mainly of three services: the Airflow web server, the
Airflow scheduler, and the Airflow worker. Supporting resources include an RDS
to host the Airflow metadata database, an SQS to be used as broker backend, S3
buckets for logs and deployment bundles, an EFS to serve as shared directory,
and a custom CloudWatch metric measured by a timed AWS Lambda. All other
resources are the usual boilerplate to keep the wind blowing.### Deployment and File Sharing
The deployment process through CodeDeploy is very flexible and can be tailored
for each project structure, the only invariant being the Airflow home directory
at `/airflow`. It ensures that every Airflow process has the same files and can
upgraded gracefully, but most importantly makes deployments really fast and easy
to begin with.There's also an EFS shared directory mounted at at `/mnt/efs`, which can be
useful for staging files potentially used by workers on different machines and
other synchronization scenarios commonly found in ETL/Big Data applications. It
facilitates migrating legacy workloads not ready for running on distributed
workers.### Workers and Auto Scaling
The stack includes an estimate of the cluster load average made by analyzing the
amount of failed attempts to retrieve a task from the queue. The metric
objective is to measure if the cluster is correctly sized for the influx of
tasks. Worker instances have lifecycle hooks promoting a graceful shutdown,
waiting for tasks completion when terminating.The goal of the auto scaling feature is to respond to changes in queue load,
which could mean an idle cluster becoming active or a busy cluster becoming
idle, the start/end of a backfill, many DAGs with similar schedules hitting
their due time, DAGs that branch to many parallel operators. **Scaling in
response to machine resources like facing CPU intensive tasks is not the goal**;
the latter is a very advanced scenario and would be best handled by Celery's own
scaling mechanism or offloading the computation to another system (like Spark or
Kubernetes) and use Airflow only for orchestration.## Get It Working
### 0. Prerequisites
- Configured AWS CLI for deploying your own files
[(Guide)](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)### 1. Deploy the stack
Create a new stack using the latest template definition at
[`templates/turbine-master.template`](/templates/turbine-master.template). The
following button will deploy the stack available in this project's `master`
branch (defaults to your last used region):[![Launch](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/images/cloudformation-launch-stack-button.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/new?templateURL=https://turbine-quickstart.s3.amazonaws.com/quickstart-turbine-airflow/templates/turbine-master.template)
The stack resources take around 15 minutes to create, while the airflow
installation and bootstrap another 3 to 5 minutes. After that you can already
access the Airflow UI and deploy your own Airflow DAGs.### 2. Upstream your files
The only requirement is that you configure the deployment to copy your Airflow
home directory to `/airflow`. After crafting your `appspec.yml`, you can use the
AWS CLI to deploy your project.For convenience, you can use this [`Makefile`](/examples/project/Makefile) to
handle the packaging, upload and deployment commands. A minimal working example
of an Airflow project to deploy can be found at
[`examples/project/airflow`](/examples/project/airflow).If you follow this blueprint, a deployment is as simple as:
```bash
make deploy stack-name=yourcoolstackname
```## Maintenance and Operation
Sometimes the cluster operators will want to perform some additional setup,
debug or just inspect the Airflow services and database. The stack is designed
to minimize this need, but just in case it also offers decent internal tooling
for those scenarios.### Using Systems Manager Sessions
Instead of the usual SSH procedure, this stack encourages the use of AWS Systems
Manager Sessions for increased security and auditing capabilities. You can still
use the CLI after a bit more configuration and not having to expose your
instances or creating bastion instances is worth the effort. You can read more
about it in the Session Manager
[docs](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager.html).### Running Airflow commands
The environment variables used by the Airflow service are not immediately
available in the shell. Before running Airflow commands, you need to load the
Airflow configuration:```bash
$ export $(xargs This project aims to be constantly evolving with up to date tooling and newer
>AWS features, as well as improving its design qualities and maintainability.
>Requests for Enhancement should be abundant and anyone is welcome to pick them
>up.
>
>Stacks can get quite opinionated. If you have a divergent fork, you may open a
>Request for Comments and we will index it. Hopefully this will help to build a
>diverse set of possible deployment models for various production needs.See the [contribution guidelines](/CONTRIBUTING.md) for details.
You may also want to take a look at the [Citizen Code of
Conduct](/CODE_OF_CONDUCT.md).Did this project help you? Consider buying me a cup of coffee ;-)
[![Buy me a coffee!](https://www.buymeacoffee.com/assets/img/custom_images/white_img.png)](https://www.buymeacoffee.com/villasv)
## Licensing
> MIT License
>
> Copyright (c) 2017 Victor VillasSee the [license file](/LICENSE) for details.