https://github.com/marcosmarxm/airflow-testing-ci-workflow
(project & tutorial) dag pipeline tests + ci/cd setup
https://github.com/marcosmarxm/airflow-testing-ci-workflow
airflow airflow-cicd airflow-testing data-engineering data-pipeline project tdd testing
Last synced: 3 months ago
JSON representation
(project & tutorial) dag pipeline tests + ci/cd setup
- Host: GitHub
- URL: https://github.com/marcosmarxm/airflow-testing-ci-workflow
- Owner: marcosmarxm
- Created: 2020-09-22T10:48:17.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-02-11T22:43:03.000Z (over 4 years ago)
- Last Synced: 2025-03-18T02:36:51.285Z (3 months ago)
- Topics: airflow, airflow-cicd, airflow-testing, data-engineering, data-pipeline, project, tdd, testing
- Language: Python
- Homepage: https://blog.magrathealabs.com/how-to-develop-data-pipeline-in-airflow-through-tdd-test-driven-development-c3333439f358
- Size: 496 KB
- Stars: 86
- Watchers: 4
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Airflow DAG development with tests + CI workflow
[](https://github.com/marcosmarxm/airflow-testing-ci-workflow/actions?query=workflow%3ACI)
[](https://github.com/marcosmarxm/airflow-testing-ci-workflow/actions?query=workflow%3AMondayBuilding)This code is complementar to the article [How to develop data pipeline in Airflow through TDD (test-driven development)](https://blog.magrathealabs.com/how-to-develop-data-pipeline-in-airflow-through-tdd-test-driven-development-c3333439f358).
I suggest you to read to better understand the code and the way I think how to setup the project.[Step-by-step: How to develop a DAG using TDD (english version)](assets/how-to/create-dag-using-tdd.md)
[Passo-a-passo: Como desenvolver uma DAG usando TDD (portuguese version)](assets/how-to/criar-dag-usando-tdd.md)## The project
Below is a summary of what will be accomplished in this project. We'll simulate the transfer of some fake transaction data from an ecommerce. A simple task transfering data from the `otlp-db` database to the `olap-db` database.

To help in the development we use a local development environment to build the pipeline with tests and also a Continuous Integration pipeline with Github Action to ensure that tests are applied at each change.
**Containers**
- **airflow**: container running local setup for development;
- **oltp-db** and **olap-db**: container that simulate database in a production environment and receive fake data;In this tutorial we won't developt the dashboard part only the pipeline.
### Dependencies?
Docker, docker-compose and makefile.### How to run?
The command below will setup the environment using docker-compose. Wait a few minutes (240s, yeah omg right?) to Airflow initialize its internal configuration, then the command will create credentials and connections.
```bash
make setup
```
By running the above command it is possible to access Airflow at `localhost: 8080`.
A user of test user: admin / password: admin is created. At this stage you can develop your DAGs and test them as you modify them.
And finally, the command that calls the `pytest` to perform tests.
```bash
make testing
```

---Some resources about Airflow testing and DataOps:
* [Pipelines on pipelines: Agile CI/CD workflows for Airflow DAGs @ Airflow Summit 2020](https://www.youtube.com/watch?v=tY4F9X5l6dg)
* [Data Testing with Airflow](https://github.com/danielvdende/data-testing-with-airflow)
* [Data's Inferno: 7 Circles of Data Testing Hell with Airflow](https://medium.com/wbaa/datas-inferno-7-circles-of-data-testing-hell-with-airflow-cef4adff58d8)
* [Testing and Debugging in Apache Airflow by GoDataDriven](https://godatadriven.com/blog/testing-and-debugging-apache-airflow/)
* [The Challenge of Testing Data Pipelines](https://medium.com/slalom-build/the-challenge-of-testing-data-pipelines-4450744a84f1)
* [Automated Testing for Proceting Data Pipeliens from Undocumented Assumptions](https://www.youtube.com/watch?v=z-kPgEAJCrA&ab_channel=Databricks)
* [Why Great Data Engineering Needs Automated Testing](https://medium.com/weareservian/why-data-engineering-needs-automated-testing-a37a0844d7db)
* [Testing in Airflow Part 1 - DAG validation tests, DAG definition tests and unit tests](https://blog.usejournal.com/testing-in-airflow-part-1-dag-validation-tests-dag-definition-tests-and-unit-tests-2aa94970570c)
* [Testing in Airflow Part 2 - Integration Tests and e2e Pipeline Tests](https://medium.com/@chandukavar/testing-in-airflow-part-2-integration-tests-and-end-to-end-pipeline-tests-af0555cd1a82)