https://github.com/abhioncbr/docker-airflow
Repo for building docker based airflow image. Containers support multiple features like writing logs to local or S3 folder and Initializing GCP while container booting. https://abhioncbr.github.io/docker-airflow/
https://github.com/abhioncbr/docker-airflow
airflow airflow-docker aws-s3 celery distributed-systems docker docker-airflow gcp-container-builder redis scheduler work-manager
Last synced: 3 months ago
JSON representation
Repo for building docker based airflow image. Containers support multiple features like writing logs to local or S3 folder and Initializing GCP while container booting. https://abhioncbr.github.io/docker-airflow/
- Host: GitHub
- URL: https://github.com/abhioncbr/docker-airflow
- Owner: abhioncbr
- License: apache-2.0
- Created: 2018-06-13T15:36:12.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2019-06-02T23:17:25.000Z (about 7 years ago)
- Last Synced: 2025-05-14T17:39:54.360Z (about 1 year ago)
- Topics: airflow, airflow-docker, aws-s3, celery, distributed-systems, docker, docker-airflow, gcp-container-builder, redis, scheduler, work-manager
- Language: Python
- Homepage: https://hub.docker.com/u/abhioncbr/
- Size: 1.04 MB
- Stars: 32
- Watchers: 3
- Forks: 5
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
[
](https://airflow.apache.org/)
# docker-airflow
[](https://circleci.com/gh/abhioncbr/docker-airflow/tree/master)
[](http://www.apache.org/licenses/LICENSE-2.0.txt)
[](https://codeclimate.com/github/abhioncbr/docker-airflow)
This is a repository for building [Docker](https://www.docker.com/) container of [Apache Airflow](https://airflow.apache.org/) ([incubating](https://incubator.apache.org/)).
* For understanding & knowing more about Airflow, please follow [curated list of resources](https://github.com/jghoman/awesome-apache-airflow).
* Similarly, for Docker follow [curated list of resources](https://github.com/veggiemonk/awesome-docker).
## Images
|Image|Pulls|Tags|
|:---|:---:|:---:|
|abhioncbr/docker-airflow|[](https://cloud.docker.com/u/abhioncbr/repository/docker/abhioncbr/docker-airflow)|[tags](https://cloud.docker.com/repository/docker/abhioncbr/docker-airflow/tags)|
## Airflow components stack
- Airflow version: Notation for representing version `XX.YY.ZZ`
- Execution Mode: `standalone`(simple container for exploration purpose, based on sqlite as airflow metadata db & SequentialExecutor ) or `prod`(single node based, LocalExecutor amd mysql as airflow metadata db) and `cluster` (for distributed production long run use-cases, container runs as either `server` or `worker` )
- Backend database: standalone- Sqlite, prod & cluster- Mysql
- Scheduler: standalone- Sequential, prod- LocalExecutor and Cluster- Celery
- Task queue: cluster- Redis
- Log location: local file system (Default) or AWS S3 (through `entrypoint-s3.sh`)
- User authentication: Password based & support for multiple users with `superuser` privilege.
- Code enhancement: password based multiple users supporting super-user(can see all dags of all owner) feature. Currently, Airflow is working on the password based multi user feature.
- Other features: support for google cloud platform packages in container.
## Airflow ports
- airflow portal port: 2222
- airflow celery flower: 5555
- redis port: 6379
- log files exchange port: 8793
## Airflow services information
- In server container: redis, airflow webserver & scheduler is running.
- In worker container: airflow worker & celery flower ui service is running.
## How to build images
* [DockerFile](docker-files/Dockerfile) uses `airflow-version` as a `build-arg`.
* build image, if you want to do some customization -
```shell
docker build -t abhioncbr/docker-airflow:$IMAGE_VERSION --build-arg AIRFLOW_VERSION=$AIRFLOW_VERSION
--build-arg AIRFLOW_PATCH_VERSION=$AIRFLOW_PATCH_VERSION -f ~/docker-airflow/docker-files/DockerFile .
```
* Arg IMAGE_VERSION value should be airflow version for example, 1.10.3 or 1.10.2
* Arg AIRFLOW_PATCH_VERSION value should be the major release version of airflow for example for 1.10.2 it should be 1.10.
## How to run using Kitmatic
* Simplest way for exploration purpose, using [Kitematic](https://kitematic.com)(Run containers through a simple, yet powerful graphical user interface.)
* Search abhioncbr/docker-airflow Image on [docker-hub](https://hub.docker.com/r/abhioncbr/docker-airflow/)
[
](search-docker-airflow-Kitematic.png)
* Start a container through Kitematic UI.
[
](run-docker-airflow-Kitematic.png)
## How to run
* General commands -
* starting airflow image as a `airflow-standalone` container in a standalone mode-
```shell
docker run --net=host -p 2222:2222 --name=airflow-standalone abhioncbr/airflow-XX.YY.ZZ -m=standalone &
```
* Starting airflow image as a `airflow-server` container in a cluster mode-
```shell
docker run --net=host -p 2222:2222 -p 6379:6379 --name=airflow-server \
abhioncbr/airflow-XX.YY.ZZ -m=cluster -t=server -d=mysql://user:password@host:3306/db-name &
```
* Starting airflow image as a `airflow-worker` container in a cluster mode-
```shell
docker run --net=host -p 5555:5555 -p 8739:8739 --name=airflow-worker \
abhioncbr/airflow-XX.YY.ZZ -m=cluster -t=worker -d=mysql://user:password@host:3306/db-name -r=redis://:6379/0 &
```
* In Mac using [docker for mac](https://docs.docker.com/docker-for-mac/install/) -
* Standalone Mode - starting airflow image in a standalone mode & mounting dags, code-artifacts & logs folder to host machine -
```shell
docker run -p 2222:2222 --name=airflow-standalone \
-v ~/airflow-data/code-artifacts:/code-artifacts \
-v ~/airflow-data/logs:/usr/local/airflow/logs \
-v ~/airflow-data/dags:/usr/local/airflow/dags \
abhioncbr/airflow-XX.YY.ZZ -m=standalone &
```
* Cluster Mode
* starting airflow image as a server container & mounting dags, code-artifacts & logs folder to host machine -
```shell
docker run -p 2222:2222 -p 6379:6379 --name=airflow-server \
-v ~/airflow-data/code-artifacts:/code-artifacts \
-v ~/airflow-data/logs:/usr/local/airflow/logs \
-v ~/airflow-data/dags:/usr/local/airflow/dags \
abhioncbr/airflow-XX.YY.ZZ \
-m=cluster -t=server -d=mysql://user:password@host.docker.internal:3306:3306/ &
```
* starting airflow image as a worker container & mounting dags, code-artifacts & logs folder to host machine -
```shell
docker run -p 5555:5555 -p 8739:8739 --name=airflow-worker \
-v ~/airflow-data/code-artifacts:/code-artifacts \
-v ~/airflow-data/logs:/usr/local/airflow/logs \
-v ~/airflow-data/dags:/usr/local/airflow/dags \
abhioncbr/airflow-XX.YY.ZZ \
-m=cluster -t=worker -d=mysql://user:password@host.docker.internal:3306:3306/ -r=redis://host.docker.internal:6379/0 &
```
[
](docker-airflow-entrypoint-args.png)
## Distributed execution of airflow
* As mentioned above, docker image of airflow can be leveraged to run in complete distributed run
* single docker-airflow container in `server` mode for serving the UI of the airflow, redis for celery task & scheduler.
* multiple docker-airflow containers in `worker` mode for executing tasks using celery executor.
* centralised airflow metadata database.
* Image below depicts the docker-airflow distributed platform:
[
](airflow-aws-deployment.png)