Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cdmikechen/airflow-dag-operator
Use K8s CustomResourceDefinition to define and create Airflow Dags
https://github.com/cdmikechen/airflow-dag-operator
airflow airflow-dags operator quarkus
Last synced: 19 days ago
JSON representation
Use K8s CustomResourceDefinition to define and create Airflow Dags
- Host: GitHub
- URL: https://github.com/cdmikechen/airflow-dag-operator
- Owner: cdmikechen
- Created: 2021-10-09T08:17:06.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-07-15T08:45:10.000Z (over 2 years ago)
- Last Synced: 2024-11-12T17:49:58.977Z (3 months ago)
- Topics: airflow, airflow-dags, operator, quarkus
- Language: Java
- Homepage:
- Size: 181 KB
- Stars: 1
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Airflow Dag Operator
Use K8s [CustomResourceDefinition](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/)
to replace Airflow Git Sync strategy. The main idea of the project is to start a synchronization service
with [Quarkus Operator](https://github.com/quarkiverse/quarkus-operator-sdk)
on each airflow pod to synchronize the DAG/files into the DAG folder.This project has included the docker image packaging part (buildconfig or quarkus build)
and the modified helm chart template based on the official airflow project (https://github.com/apache/airflow/tree/main/chart).## Operation Strategy
* use `kubectl` to create rbac and `CustomResourceDefinition` (operator can automatically create resources)
* create some dag resource instances
* start quarkus operator
* operator can list dag resources or automatically perceive changes in resources, then create/update or delete dags in DAG folder.## CRD Schema
Resource description can be referred to [02-crd.yaml](example/02-crd.yaml) .
There are several important attributes in `CRD`, which are described here:Parameter | Description | Default
--- | --- | ---
`type` | Type of CRD, it can be `dag_file`, `file` or `dag_yaml`. `dag_file` must be a DAG description file. `file` can be a python or other text format file.`dag_yaml` reference [dag-factory](https://github.com/ajbosco/dag-factory), but add some changes | `dag_file`
`path`| File path. If the file path is empty, it defaults to the root directory of `dags`, otherwise it is a subdirectory under `dags` |
`file_name` | If `type` is `file`, we need a `file_name`. |
`dag_name` | If `type` is `dag_file` or `dag_yaml`, we need a `dag_name`. If `dag_name` don't have `.py` suffix, the operator will automatically append it. | crd name
`content`| If `type` is `dag_file` or `file`, It is the content of the file. |
`paused`| If `paused` is not empty, the operator will scan the DAG status and automatically pause / unpause the task. |
`dag_yaml`| The described of DAG by yaml, For details, please refer to [dag-factory](https://github.com/ajbosco/dag-factory) |## Dev Or Test
We can run our application in dev mode that enables live coding using:
```shell script
./mvnw compile quarkus:dev
```An example has been in `/example` folder. In `/example`, it includes `RBAC`,`CRD`, some cases and `Deployment` for test.
## Quarkus Image Build
If we use OpenShift, we can use `BuildConfig` or `Tekton/Pipline` to build a native image.
Otherwise, we can create a native executable using:
```shell script
./mvnw package -Pnative
# if use macOS, you should use -Dquarkus.native.container-build=true to build quarkus in docker with a linux environment
docker build -f src/main/docker/Dockerfile.native -t quarkus/airflow-dag-operator .
```Or, if we don't have GraalVM installed, we can run the native executable build in a container using:
```shell script
./mvnw package -Pnative -Dquarkus.native.container-build=true
docker build -f src/main/docker/Dockerfile.native -t quarkus/airflow-dag-operator .
```## Airflow Helm Chart
Helm dependency update to add postgresql chart and `lint`. We need helm3 to build.
```shell script
# dependency update
helm dep update# lint
helm lint# debug
helm install --dry-run --debug -f values.yaml airflow -n airflow .
```Deploy Chart
```shell script
# install
helm install -f values.yaml airflow -n airflow .# upgrade
helm upgrade -f values.yaml airflow -n airflow .# uninstall
helm uninstall airflow
```## Support pause
We need to rebuild the image
```shell
# if want to support pause, we need to build by change `quarkus.datasource.jdbc` from false to true
./mvnw package -Pnative -Dquarkus.datasource.jdbc=true
```Due to the complexity of parsing DAG's python codes, we need to ensure that `dag_name` and `dag_id` are consistent for now.
Note that the helm has not been modified yet right now! by design, the operator will only turn on `support pause` on the scheduler node to avoid repeated executions.# History
* 2021-10-09 1.0.0 First Commit
* 2021-11-16 1.0.1 Update to quarkus-operator-sdk 2.0.0
* 2022-03-27 1.0.1 Update to quarkus-operator-sdk 3.0.5
* 2022-07-15 1.0.2 Update to quarkus-operator-sdk 4.0.0.RC / Support paused